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A TEACHER  OBSERVATION  INSTRUMENT  AND 
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By 

Virginia  D.  Sharpe 
April  1988 

Chairman:  Gordon  D.  Lawrence 

Major  Department:  Educational  Leadership 

The  purpose  of  this  study  was  to  develop  a formative 
observation  instrument  and  a training  package  for  use  in 
observing  physical  education  teachers  in  an  activity-class 
setting.  The  instrument  was  based  upon  the  Florida 
Performance  Measurement  System  Domain  3.0,  Instructional 
Organization  and  Development,  formative  observation 
instrument.  A survey  of  physical  education  research  studies 
in  the  areas  of  pedagogy  and  motor  skills  acquisition  led  to 
the  selection  of  observation  indicators  in  five  categories: 
efficient  use  of  time,  lesson  development,  demonstrations, 
supervised  performance,  and  feedback.  A training  package 
was  developed  and  field  tested.  It  included  the  following: 
a summary  of  the  research  base,  a coding  manual,  a practice 
test,  audiotapes  and  videotapes  for  coding  practice,  and  a 


xiii 


trainer's  manual.  A criterion  accuracy  test  (observing  a 
videotape  using  the  instrument)  and  a multiple-choice  test 
were  used  to  evaluate  the  effectiveness  of  the  training. 

To  validate  the  training  package,  participants  were 
randomly  selected  for  inclusion  in  one  of  three  groups:  two 

treatment  groups  and  a nontreatment  control  group.  One 
treatment  session  was  conducted  by  the  developer,  the  other 
session  was  conducted  by  a participant  of  the  first  training 
session,  using  the  training  materials  that  had  been 
developed . 

The  primary  means  of  data  analysis  was  ANOVA.  Alpha 
was  set  at  p < .05.  A Bonferroni  t-Test  was  conducted 
following  significant  F-ratios.  The  results  on  the 
multiple-choice  test  indicated  a positive  treatment  effect 
for  the  two  training  groups  although  the  effect  was  not 
significant.  An  analysis  of  the  criterion  accuracy  scores 
(on  the  videotape)  indicated  a significant  difference  in 
favor  of  the  treatment  groups  on  5 of  the  10  criterion 
accuracy  measures.  Participants  gave  high  ratings  to  the 
training  materials  and  to  both  the  trainers. 

The  following  conclusions  were  reached:  The 

training  process  was  effective,  observers  should  receive  all 
necessary  information  in  writing,  the  instrument  might  be 
used  in  teacher-effectiveness  studies,  and  the  indicators 
appear  to  be  associated  with  student  achievement. 
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CHAPTER  I 
INTRODUCTION 

In  1983  the  State  of  Florida  implemented  the 
Beginning  Teacher  Program,  which  required  each  new  teacher 
to  complete  a program  of  evaluation  and  assistance  before 
receiving  a Florida  teaching  certificate  (Florida  Department 
of  Education  [DOE],  1986a,  Florida  Statutes,  Chapter 
231.17).  Each  teacher  who  was  beginning  a career  was 
assigned  a three-member  team  consisting  of  a building-level 
administrator,  peer  teacher,  and  one  other  educator.  The 
team  had  the  task  of  evaluating  and  assisting  the  teacher 
using  the  instruments  of  the  Florida  Performance  Measurement 
System  (FPMS) . 

The  instruments  of  the  FPMS  were  developed  from  a 
review  of  the  current  process-product  research  on  effective 
teaching.  The  FPMS  documented  121  specific  teacher- 
behaviors  that  were  shown  through  research  to  be  directly 
related  to  increased  student  achievement  and  improved 
classroom  conduct  (Florida  DOE,  1983).  These  121  behaviors 
identified  as  part  of  the  teaching  process  were  categorized 
into  six  domains  for  the  purposes  of  the  FPMS: 
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Domain  1.0 — Planning 

Domain  2.0 — Management  of  Student  Conduct 
Domain  3.0 — Lesson  Organization  and  Development 
Domain  4.0 — Presentation  of  Subject  Matter 
Domain  5.0 — Communication:  Verbal  and  Nonverbal 

Domain  6.0 — Testing 

The  Florida  Performance  Measurement  System  consisted 
of  seven  instruments  which  focused  on  teacher  behavior:  six 

formative  instruments  (one  for  each  of  the  domains)  and  a 
summative  instrument  (Florida  DOE,  1983).  The  building- 
level  administrator  was  directed  to  use  the  summative 
evaluation  instrument  at  the  beginning  of  the  program  as  a 
diagnostic  tool  and  again  at  the  end  as  an  evaluation  tool. 
The  peer  teacher  and  the  other  educator  were  to  use  the 
formative  evaluation  instruments  throughout  the  program  in 
efforts  to  help  the  teacher  improve  his  or  her  performance. 
The  summative  instrument  covered  generalized  behaviors  from 
four  of  the  domains,  while  each  of  the  formative  instruments 
covered  a greater  variety  of  specific  teaching  behaviors 
from  a particular  domain. 

The  Nature  of  the  Problem 
The  concepts  and  indicators  used  in  the  FPMS 
instruments  were  selected  by  the  Beginning  Teacher  Program 
developers  because  they  appeared  to  be  generic 
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effective-teaching  behaviors  applicable  across  subject  areas 
and/or  grade  levels.  The  appearance  of  general 
applicability  was  indicated  by  distributions  yielded  from  a 
study  in  which  the  summative  instrument  was  used  to  observe 
1,223  teachers  in  45  schools  and  13  Florida  school  districts 
(Peterson,  1985).  However,  while  using  the  formative 
instruments  with  physical  education  teachers  in  an  activity 
setting,  observers  noted  limitations.  While  the  teaching 
behaviors  selected  for  the  FPMS  were  appropriate  for 
physical  education  teachers  in  a classroom  setting,  there 
were  many  effective  teaching  behaviors  observable  in 
physical  activity  instruction  which  were  not  included.  This 
omission  was  particularly  evident  in  Domain  3.0, 
Instructional  Organization  and  Development.  The  problem 
that  prompted  this  study  was  the  apparent  lack  of  fit  of  the 
FPMS  to  active  physical  education  instruction,  and  the  need 
for  systematic  identification  of  teaching  behaviors  for  that 
instruction  that  could  supplement  the  FPMS  categories. 

Statement  of  the  Problem 

This  study  had  three  problems.  The  first  was  to 
identify  effective  teaching  behaviors  from  motor  skill 
and/or  physical  education  activity  settings.  The  second  was 
to  develop  a teacher  observation  instrument  based  on  the 
identified  research — an  instrument  designed  to  measure  a 
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teacher's  instructional  performance  in  a physical  education 
activity  setting.  The  third  problem  was  to  develop  a 
package  for  training  observers  to  use  the  instrument  and  to 
train  a group  of  observers  to  test  the  usability  of  the 
instrument . 


Definition  of  Terms 

Terms  Relating  to  the  Observation  Process 

Formative  observation.  Formative  observation  is 
defined  as  an  observation  conducted  for  the  primary  purpose 
of  determining  the  specific  behaviors  a teacher  fails  to 
demonstrate  to  an  acceptable  degree.  This  determination  is 
followed  by  providing  the  teacher  with  support  in  areas 
needed  for  improvement  of  instruction.  The  formative 
observation  is  not  to  be  used  to  make  decisions  regarding 
hiring  or  dismissal  (Florida  DOE,  1986b) . 

Summative  observation.  Summative  observation  is  an 
observation  conducted  for  the  purpose  of  gathering  data  to 
be  used  in  making  decisions  such  as  retention,  promotion,  or 
dismissal  of  a teacher  ( Darling-Hammond , Wise,  & Pease, 

1983) . 

Interobserver  agreement.  Interobserver  agreement  is 
the  degree  to  which  two  or  more  observers  working 
independently  agree  upon  the  recording  of  the  same  indicator 
of  teacher  behavior.  Interobserver  agreement  is  viewed  as 
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evidence  of  objectivity.  It  is  not  synonymous  with 
reliabilities  of  observational  measures.  For  example, 
observers  can  agree  nearly  perfectly,  yet  they  can  collect 
very  unreliable  data  if  the  behaviors  of  those  being 
observed  differ  little,  or  if  behaviors  are  truly  unstable 
from  occasion  to  occasion  (Frick  & Semmel , 1978). 

Reliability.  Reliability  is  the  degree  of 
consistency  with  which  an  instrument  measures  whatever  it  is 
measuring  (Ary,  Jacobs,  & Razavieh,  1979).  Records  made  by 
different  observers  on  different  occasions  in  the  same 
classrooms,  when  the  teacher  is  working  at  equivalent  tasks, 
should  be  identical  or  nearly  so. 

Stability.  Stability  is  the  degree  to  which 
observers  consistently  record  an  indicator  of  teacher 
behavior  when  displayed  on  two  or  more  occasions  (Ary, 
Jacobs,  & Razavieh,  1979;  Darling-Hammond , Wise,  & Pease, 
1983) . 

Second-level  training.  Second-level  training  is 
training  that  is  done  by  someone  other  than  the  developer  of 
the  instrument,  generally  someone  who  has  been  through  the 
training  program. 

Validity . Validity  is  the  extent  that  differences 
in  scores  yielded  by  a category  observation  instrument 
reflect  actual  differences  in  behavior  (Medley  & Mitzel, 
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1963).  The  term  validity  means  that  an  instrument  really 
measures  what  it  claims  to  be  measuring. 

Observer  validity.  Observer  validity  means  the 
degree  to  which  the  frequencies  in  an  observer's  record 
agree  with  the  actual  frequencies  of  occurrence  of  the  items 
or  categories  of  behavior  recorded  (Medley,  Coker,  & Soar, 
1984 ) . 

Content  validity.  Content  validity  is  the  degree  to 
which  the  items  on  an  instrument  sample  the  behaviors  about 
which  conclusions  are  to  be  drawn  (Ary,  Jacobs,  & Razavieh, 
1979)  . 

Construct  validity.  Construct  validity  is  the 
degree  to  which  the  theoretical  claims  and  supports  of  an 
instrument  are  substantiated  both  logically  and  empirically 
(Ary,  Jacobs,  & Razavieh,  1979). 

Criterion-related  validity.  Criterion-related 
validity  describes  the  relationship  between  scores  on  the 
instrument  in  question  and  scores  or  performance  on  some 
other  variable  used  as  a criterion  (Ary,  Jacobs,  & Razavieh, 
1979)  . 

Face  validity.  Face  validity  refers  to  the  degree 
to  which  an  instrument  appears  to  measure  or  describe  what 
it  purports  to  measure  or  describe  (Ary,  Jacobs,  & Razavieh, 


1979)  . 
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Consensual  validity.  Consensual  validity  results 
when  a number  of  people  examine  an  instrument  and  express 
their  opinion  about  whether  each  item  measures  what  the 
developer  wants  it  to  measure  (Crowl,  1986). 

Halo  effect.  Halo  effect  is  the  tendency  for  an 
observer  to  rate  the  person  being  observed  on  the  basis  of 
an  overall  impression  of  how  competent  the  person  being 
rated  is,  thus  resulting  in  high  positive  correlations 
between  ratings  of  presumably  independent  characteristics 
(Medley,  Coker,  & Soar,  1984). 

Terms  Relating  to  the  Instrument 

Protocols . Protocols  are  a representation  of 
reality  which  illustrate  an  idea  or  concept,  e.g.,  scripts, 
audiotapes,  videotapes.  They  are  intended  to  prepare  an 
observer  for  functioning  in  a real  situation  (Roderick, 
1975)  . 

Concept . Concept  is  a verbal  construct  which 
represents  a number  of  specific,  related  behaviors  which 
will  be  carefully  explicated  by  indicators  and  examples. 
Concepts  form  the  classification  system  for  organizing  the 
many  indicators  on  a formative  instrument  (Florida  DOE, 
1986b) . 

Indicator . Indicator  is  the  term  describing  a 
concrete,  objective,  observable  instance  of  teacher 
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behavior.  The  behavior  may  be  classified  as  effective  or 
ineffective  (Florida  DOE,  1986b). 

Principle . Principle  is  a clear  and  concise  if-then 
statement  stating  the  effect  of  the  research  preceding  it 
(Florida  DOE,  1986b) . 

Objectivity.  Objectivity  is  the  term  used  when  the 
categories  which  comprise  an  instrument  are  defined  so  that 
the  definitions  are  directly  reducible  to  empirical 
realities  (Martin,  1977). 

Relevance . Relevance  is  the  term  used  when  there  is 
a reasonable  likelihood  that  behaviors  on  an  observation 
instrument  will  occur  in  the  contexts  being  observed 
(Martin,  1977). 

Parsimony.  Parsimony  refers  to  having  a reasonable 
degree  of  simplicity  in  the  recording  process.  Complexities 
in  instrument  construction  can  severely  limit  the  practical 
utility  of  the  category  system  (Martin,  1977). 

Efficient . Efficient  is  the  term  that  describes  an 
instrument  when  all  categories  are  employed  in  the  recording 
process  and  practical  discriminations  between  categories  are 
relatively  salient  (Martin,  1977). 

Assumptions 

The  major  assumptions  underlying  this  entire  study 
are  (a)  that  it  is  possible  to  identify  specific  physical 


9 


education  teacher  behaviors  and  (b)  that  there  is  value  in 
the  process.  Other  assumptions  involve  the  construction  of 
the  instrument,  the  identification  of  teacher  behavior,  the 
effectiveness  of  formative  observations  in  changing  teacher 
behavior,  and  the  paradigm  that  observers  trained  to 
criterion  agreement  on  an  instrument  while  observing  a 
videotape  will  retain  about  the  same  level  of  agreement  when 
observing  live  events. 

A researcher  who  develops  a teacher  observation 
instrument  makes  some  assumptions  regarding  the 
identification  of  teacher  behavior.  Among  these  assumptions 
are  the  following:  it  is  appropriate  to  evaluate  the 

performance  (instructional  behavior)  of  a teacher,  it  may  be 
more  valid  to  evaluate  the  teacher  in  this  manner  rather 
than  on  the  basis  of  student  outcome  or  personal 
characteristics,  the  behavior  of  a teacher  can  be 
effectively  categorized  after  viewing  one  or  more  periods  of 
instruction. 

When  an  instrument  is  used  for  evaluation  purposes, 
additional  assumptions  are  made  as  follows:  teacher 

behavior  causes,  or  correlates  with,  student  achievement; 
the  behaviors  tallied  for  a teacher  are  representative  of 
the  teacher's  universe  of  behaviors;  the  behaviors  tallied 
as  effective  or  ineffective  were  used  effectively  or 
ineffectively  during  the  instruction  which  was  observed. 
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The  FPMS  and  other  observation  systems  which  are  based  upon 
teacher  behavior  are  established  upon  the  latter 
assumptions.  These  systems  have  generally  been  the  basis 
for  process-product  research  and  the  conclusion  drawn  from 
it:  that  what  teachers  do  in  classrooms  affects  student 

achievement  (Centra  & Potter,  1980;  Medley,  1979;  Rosenshine 
& Furst,  1971) . 

Another  assumption  is  that  teacher  behavior  can  be 
changed  through  a process  which  includes  the  observation  and 
identification  of  effective  and  ineffective  teaching 
behaviors  followed  by  counseling  and  assistance  to  improve 
the  teacher's  performance.  This  assumption  is  supported  by 
Bandura  (1982),  who  said  that  "people  often  do  not  behave 
optimally,  even  though  they  know  full  well  what  to  do.  This 
is  because  self-referent  thought  also  mediates  the 
relationship  between  knowledge  and  action"  (p.  122). 

Although  Darling-Hammond , Wise,  and  Pease  (1983)  indicated 
that  perceived  self-efficacy  is  a strong  predictor  of 
subsequent  behavior,  they  also  suggested  that  an 
individual's  sense  of  efficacy  can  be  influenced  through 
interactions  with  others. 

A final  assumption  related  to  this  investigation  is 
that  observers  trained  to  criterion  agreement  on  an 
instrument  while  observing  a videotape  will  retain  about  the 
same  level  of  agreement  when  observing  live  events.  Medley 
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and  Norton  (cited  by  Frick  & Semmel,  1978)  concluded  that 
investigators  need  only  document  that  their  observers  were 
competent  upon  completion  of  training,  and  that  competency 
could  be  determined  by  observer  agreement  on  unambiguous 
examples  of  behavioral  categories  shown  on  videotape. 

Delimitations  of  the  Study 
Until  the  present  century,  most  teacher  evaluation 
was  based  on  rating  teachers  on  identified  personal 
qualities  widely  believed  to  distinguish  more  effective  from 
less  effective  teachers  (Medley,  Coker,  & Soar,  1984).  The 
qualities  were  identified  by  asking  questions  of  educators; 
in  this  way  it  was  found  that  more  effective  teachers  were 
better  informed  about  subject  content;  were  brighter;  had 
different  values,  attitudes,  and  personality 
characteristics;  and  conveyed  a sense  of  presence  that  was 
different  from  a less  effective  teacher.  Other  evaluation 
methods  used  included  the  use  of  supervisors'  ratings, 
average  gains  in  achievement  of  pupils,  a standardized  test 
battery,  and/or  a teacher  competency  test.  Although  the 
research  on  teacher  effectiveness  began  almost  100  years 
ago,  it  has  not  been  until  the  last  20  or  30  years,  when 
researchers  abandoned  the  rating  scale  and  other  teacher 
evaluation  methods  and  began  to  use  structured  observation 
schedules,  that  significant  relationships  between  classroom 
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behavior  and  pupil  learning  gains  were  clearly  noted  and  a 
substantial  number  of  generalizable  teacher  behaviors  were 
established  (Medley,  1977).  The  significance  of  these 
studies  has  led  to  an  increased  interest  in  structured 
observation  systems. 

The  development  of  a structured  observation  system 
is  a slow  process,  involving  many  tryouts  and  revisions 
(Medley,  Coker,  & Soar,  1984).  A long  period  of 
experimental  use  is  recommended,  although  it  may  be 
shortened  somewhat  by  obtaining  expert  assistance  both  in 
the  initial  steps  and  during  the  refinement  phase.  The 
first  step  is  to  identify  a set  of  behavioral  dimensions 
which  would  be  different  when  comparing  an  effective  with  an 
ineffective  teacher.  One  of  the  three  strategies  which  can 
be  used  to  choose  dimensions  to  evaluate  is  to  review  the 
process-product  teacher  effectiveness  research  and  select 
those  dimensions  of  teacher  behavior  which  have  been  shown, 
by  research,  to  be  related  to  effective  instruction.  Once 
the  dimensions  of  behavior  have  been  identified,  the  long 
process  of  definition,  redefinition,  and  re-redefinition 
begins--a  process  which  may  go  on  for  months  or  years. 
Identification  of  a behavior  includes  the  identification  of 
teacher  behaviors  which  require  a minimum  of  observer 
inference.  Medley,  Coker,  and  Soar  emphasized  that  the 
simpler  the  definition,  the  longer  it  takes  to  arrive  at  it. 
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Also  involved  in  the  process  of  completing  an  observation 
instrument  is  the  method  of  training  observers,  the 
establishment  of  the  validity  of  the  instrument  and  the 
observers,  and  the  reliability  of  the  instrument. 

The  long-range  development  process  for  this 
instrument  is  as  follows: 

1.  Identification  of  the  dimensions  of 
performance  which  are  to  be  evaluated. 

2.  Definition  and  clarification  of  those 
dimensions  through 

a.  Choosing  the  dimensions  which  will  be 
evaluated,  and 

b.  Defining  each  dimension  through  the 
selection  of  specific  low-inference 
teacher  behaviors. 

3.  Development  of  the  instrument 

a.  Selection  of  categories  and  indicators, 

b.  Method  of  recording  (i.e.,  category, 
sign,  or  multiple) , and 

c.  Frequent  tryouts  and  revisions. 

4.  Identification  of  observer  effects  and 
inferences  (e.g.,  coder  drift,  halo  effect). 

5.  Development  of  a method  for  training 
observers  through  the  following: 

a . 


Training  materials. 
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b.  Coding  manual,  and 

c.  Evaluation  of  observer  competence. 

6.  Establishment  of  the  validity  of  the 
instrument . 

7.  Establishment  of  the  reliability  of  the 
instrument . 

In  this  dissertation,  this  development  is  traced  through 
step  5 and  a limited  level  of  validity,  step  6,  is 
established.  It  began  with  a survey  of  teacher 
effectiveness  research  in  physical  education  activity 
classes  and/or  motor  skills  instructional  settings.  The 
dimensions  of  teacher  behavior  chosen  for  the  instrument 
were  then  defined  and  low-inference  observable  behaviors 
described  for  the  observer.  After  the  actual  observation 
instrument  was  designed,  a training  manual,  a training 
process,  and  a method  of  evaluating  the  observers  were 
developed.  A limited  level  of  instrument  validity  was 
established . 

The  quality  of  a test,  or,  in  this  case,  an 
observation  instrument,  is  usually  judged  in  terms  of  its 
validity  and  reliability.  A test  is  valid  "to  the  extent 
that  it  measures  what  it  claims  to  be  measuring,  and  it  is 
reliable  to  the  extent  that  whatever  it  measures  it  measures 
consistently"  (Crowl,  1986,  p.  94).  Crowl  noted  that 
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typically  it  is  infinitely  more  difficult  to  determine  a 
test's  validity  than  its  reliability. 

Face  validity  is  usually  considered  the  most  basic 
form  of  validity  necessary  for  any  instrument,  although  some 
researchers  have  stated  that  face  validity  is  really  not  a 
form  of  validity  at  all  (Crowl,  1986).  It  stands  to  reason 
that  teachers  will  be  more  willing  to  be  observed  by  an 
instrument  which  contains  items  that  appear  relevant  to  the 
process  of  teaching  than  they  will  if  the  instrument  appears 
to  have  no  relevance  to  the  teaching  process.  Face  validity 
for  the  instrument  developed  as  a part  of  this  study  has 
been  established  by  a method  recommended  by  Medley,  Coker, 
and  Soar  (1984).  They  suggested  that  a tentative  set  of 
behavior  items,  developed  from  theory,  be  checked  against 
their  research  tables  (summaries  of  the  effective  teaching 
literature)  to  see  whether  there  was  empirical  support  for 
the  relevance  of  each  item.  "The  surviving  set  of  items 
should  possess  face  validity"  (p.  74).  In  addition  to 
checking  their  research  summaries,  an  independent  literature 
search  of  physical  education  and  motor  skills  learning  was 
conducted  and  the  findings  were  compared  to  each  of  the 
items . 

An  important  concept  to  keep  in  mind  is  that  an 
instrument  is  valid  only  for  a specific  purpose  with  a 
specific  group  of  people  (Crowl,  1986).  Since  this 


16 


observation  instrument  is  being  developed  for  use  in  a 
teacher  evaluation  system,  the  indicators  selected  must  be 
consistent  with  the  real  world,  thus  requiring,  at  a 
minimum,  content  and  consensual  validity.  Wiersma  (1969) 
stated  that  the  determination  of  content  validity  usually 
involves  detailed  analyses  of  content  and  the  opinions  of 
experts  in  the  field.  For  this  study,  content  validity  was 
established  by  a study  of  the  teacher  behaviors  which  might 
be  expected  during  an  active  learning  situation.  These 
behaviors  were  identified  through  a review  of  research 
studies,  textbooks  on  teaching  physical  education,  and 
articles  on  teaching  selected  from  physical  education 
journals . 

Although  Crowl  (1986)  stated  that  validation  by 
consensus  is  a weak  form,  he  said  that  "consensual  validity 
is  better  than  no  validity"  (p.  109).  The  consensual 
validity  of  this  instrument  has  been  established  through  a 
review  of  the  instrument  by  professionals  charged  with  the 
duty  of  supervising  physical  educators.  Their  comments  and 
suggestions  have  been  incorporated  into  the  instrument.  In 
addition  to  consensual  validation,  Crowl  stated  that  the 
developer  should  administer  the  instrument  through  a tryout 
process,  thus  revealing  ambiguities  and  weaknesses  which 
were  not  initially  apparent.  The  observation  instrument  has 
gone  through  a tryout  process  leading  to  many  changes  in  the 
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selection  of  concepts  and  indicators  and  the  layout  of  the 
instrument . 


Limitations  of  the  Study 
A key  element  in  the  instructional  system  is  the 
teacher;  therefore,  much  of  the  current  evaluation 
development  effort  has  been  focused  on  the  teacher's 
performance.  The  results  of  process-product  studies  have 
been  used  to  make  statements  about  effective  and  ineffective 
ways  to  teach  (Brophy  & Evertson,  1976),  and  a variety  of 
books,  programs,  and  professional  articles  have  been  written 
and  produced  based  on  the  results  of  these  studies.  The 
great  majority  of  these  publications  has  concentrated  on  the 
evaluation  of  teacher  behavior,  in  particular,  the 
systematic  observation  of  classroom  behavior.  As  noted 
previously,  some  researchers  have  concluded  that  what 
teachers  do  in  the  classroom  affects  students'  performances 
and  claim  that  discrete  sets  of  teacher  behaviors  can  lead 
to  increased  student  performance  (Centra  & Potter,  1980; 
Medley,  1979;  Rosenshine  & Furst,  1971).  Other  researchers, 
however,  have  contradicted  these  conclusions.  After 
examining  the  Beginning  Teacher  Evaluation  Study  conducted 
for  California's  Commission  for  Teacher  Preparation  and 
Licensing,  the  most  extensive  process-product  study  of 
teacher  effectiveness  to  date.  Bush  (1979)  concluded  that  it 
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was  not  possible,  at  the  time,  to  link  precise  and  specific 

teacher  behavior  to  precise  and  specific  learning  of  pupils 

(the  original  goal  of  the  inquiry),  although  it  had  been 

determined  that  teacher  behaviors  do  affect  student 

achievement.  Darling-Hammond , Wise,  and  Pease  (1983)  stated 

that  at  best,  "the  teaching  performances  advanced  as  having 

consistently  positive  effects  on  student  achievement  are 

relatively  broad  constructs  rather  than  discrete,  specific 

actions  of  teachers"  (p.  293).  Centra  and  Potter  qualified 

their  conclusion  regarding  the  correlation  between  teacher 

behavior  and  student  achievement  by  reporting  that 

The  variable  closest  in  the  causal  chain  to  student 
achievement  is  student  behavior.  Teacher 
performance  is  seen  as  a substantial  contributor  to 
the  variance  in  student  behavior,  and  should  be 
expected  to  correlate  more  highly  with  student 
behavior  than  with  student  achievement.  (p.  287) 

For  the  purposes  of  this  study,  the  thesis  that 
teacher  instructional  behavior  does  cause,  or  at  least 
correlate  with,  student  achievement  was  accepted  as  valid. 
The  truths  evident  in  the  conflicting  viewpoints  should  give 
rise  to  a hesitation  in  seeking,  or  requiring,  only  the 
specific  behaviors  listed  on  this  formative  instrument. 
Although  the  research  linking  teacher  behavior  to  increased 
student  achievement  has  been  persuasive  enough  to  lead  to 
the  development  of  this  and  other  teacher-behavior 
observation  instruments,  there  are  other  forces  at  work  in 
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the  teaching-learning  process  which  affect  student 
achievement  outcomes.  Centra  and  Potter  (1980)  pointed  out 
this  caution  when  they  stated,  "Student  achievement  is 
affected  by  a considerable  number  of  variables,  of  which 
teacher  behavior  is  but  one"  (p.  287).  The  categories  on 
the  observation  instrument  developed  as  a part  of  this  study 
closely  match  the  definition  of  broad  constructs  which  have 
consistently  positive  effects  on  student  achievement. 

Neither  the  categories  nor  the  more  specific  behavioral 
indicators  should  be  strictly  interpreted  to  mean  that 
student  achievement  will  positively  be  increased  if  the 
behaviors  listed  are  used  by  a teacher.  Similarly,  they 
should  not  be  interpreted  to  mean  that  student  achievement 
will  positively  be  decreased  if  the  behaviors  are  not  used 
by  a teacher. 


Remaining  Chapters 

Chapter  II  is  a review  of  the  literature  pertaining 
to  the  teaching  of  skills  in  a motor  skill  or  physical 
education  activity  setting.  Chapter  III  is  a review  of  the 
literature  dealing  with  training  and  evaluation  methods, 
materials,  and  models.  Chapter  IV  is  a description  of  the 
development  and  design  of  the  instrument,  the  training,  the 
evaluation  materials,  and  the  data  collection.  Chapter  V of 
the  dissertation  includes  the  analysis  and  interpretation  of 
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data,  and  Chapter  VI  contains  the  summary,  conclusions, 
implications,  and  recommendations. 


CHAPTER  II 

REVIEW  OF  THE  PHYSICAL  EDUCATION/ 

MOTOR  SKILLS  LITERATURE 

This  chapter  includes  a survey  of  the  literature 
which  documents  specific  teacher  behaviors  which  have  been 
shown  through  research  to  be  directly  related  to  increased 
student  achievement . Analysis  of  these  studies  provided  the 
indicators  and  categories  which  were  used  as  the  basis  of 
the  teacher  observation  instrument  which  was  developed  for 
this  study.  The  research  has  been  extracted  from  studies 
dealing  with  the  learning  of  motor  skills  or  teaching 
strategies  used  in  physical  education  activity  classes. 

Since  the  instrument  was  adapted  from  one  of  those  developed 
for  the  Florida  Performance  Measurement  System  (FPMS)  (see 
Appendix  A) , some  indicators  which  appeared  pertinent  to  an 
activity  setting  were  kept  on  the  instrument  even  though 
supporting  research  from  the  areas  of  physical  education  or 
motor  skills  was  not  found.  In  these  cases,  applicable 
research  from  other  areas  of  effective  teaching  has  been 
reported . 

The  review  is  organized  in  five  sections.  Included 
in  the  first  section  is  the  research  base  for  the 
observation  instrument  indicators  in  the  Efficient  Use  of 
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Time  category.  The  second  section  details  the  research 
supporting  the  Lesson  Development  portion  of  the  instrument. 
Selected  literature  concerning  demonstrations  is  reviewed  in 
the  third  section.  The  fourth  section  covers  the  literature 
on  practice,  and  the  fifth  section  is  a review  of  the 
literature  regarding  teacher  feedback.  Each  of  these 
sections  includes  both  effective  and  ineffective  teacher 
behaviors  and  concludes  with  a summary  and  one  or  more 
statements  of  principle  drawn  from  the  research  which 
underlies  the  indicators  and  categories  selected  for 
inclusion  on  the  observation  instrument.  The  chapter  is 
finished  with  a final  section  indicating  what 
generalizations  can  be  made  based  on  the  review  of  the 
literature . 

Efficient  Use  of  Time 

The  importance  of  the  efficient  use  of  classroom 
time  has  been  well  documented  (Brophy  & Evertson,  1976; 
Fisher,  Berliner,  Filby,  Marliave,  Cahen,  & Dishaw,  1980; 
Florida  DOE,  1983;  Rutter,  Maughan,  Mortimer,  Oustin,  & 
Smith,  1979)  with  the  conclusion  that  more  time-on-task  is 
associated  with  greater  student  achievement  and  lower 
disruptive  behavior.  The  value  of  increased  time-on-task  in 
the  physical  education/motor  skill  field  also  has  been  amply 
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documented  ( Godbout , Brunelle,  & Tousignant,  1983;  Graham  & 
Helmerer,  1981;  Siedentop,  Birdwell,  & Metzler,  1978). 

In  their  study  of  academic  learning  time  (ALT)  in 
physical  education,  Godbout,  Brunelle,  and  Tousignant  (1983) 
pointed  out  the  specific  importance  of  keeping  students 
involved  in  physical  education  classwork.  They  ascertained 
that  when  the  students  in  their  study  were  effectively 
engaged  in  physical  education  content  activities,  they  had  a 
very  high  degree  of  success  in  assigned  tasks.  Time  studies 
on  the  amount  of  ALT  which  occurs  in  average  physical 
education  classes  have  generally  reported  a low  percentage 
of  student  involvement  with  learning  tasks.  Godbout, 
Brunelle,  and  Tousignant  reported  an  average  of  30%  ALT  in 
classes  observed,  with  19%  to  34%  of  the  class  time  spent  in 
other  than  physical  education  content  activities.  During 
the  time  that  an  entire  class  was  involved  in  physical 
education  content  activities,  students,  considered 
individually,  were  on-task  approximately  50%  of  the  time. 
This  means  that  an  individual  student  was  on-task 
approximately  15%  of  the  instructional  time.  Costello 
(cited  by  Gusthart  & Rink,  1983),  in  a study  of  elementary 
children,  also  found  that  less  than  one-third  of  the 
instructional  time  was  devoted  to  motor  activity  and  60%  was 
spent  either  waiting  or  listening  to  the  teacher.  Anderson 
and  Barrette  (1978)  reported  on  their  study  of  193 
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elementary  school  students  who  were  observed  collectively 
for  a total  of  168,454  seconds  in  physical  education 
classes.  The  students  spent  slightly  less  than  two-thirds 
(63.2%)  of  the  total  time  observed  in  nonmovement  behavior. 
This  two-thirds  included  two  behaviors:  listening  to 

instruction,  25.4%  of  their  time,  and  waiting  to 
participate,  35.4%  of  their  time.  Hawkins,  Wiegand,  and 
Landin  (1985)  affirmed  that  "descriptive  student  behavior 
studies  revealed  that  waiting  is  one  of  the  most  predominant 
activities  in  the  gymnasium"  (p.  250). 

What  are  the  behaviors  which  affect  the  amount  of 
ALT  in  physical  education  classes?  Beamer  (1982),  who 
reported  that  academic  learning  time — physical  education 
(ALT-PE)  in  his  study  averaged  only  15%,  stated  that  ALT-PE 
appeared  to  be  affected  primarily  by  the  nature  of  the 
activity,  the  amount  of  activity  time  available,  and  the 
efficient  use  of  activity  time.  Hawkins,  Wiegand,  and 
Landin  (1985),  in  a study  to  determine  situations  which 
contribute  to  or  detract  from  the  maintenance  of  task 
attention  in  physical  education,  cited  high  management  as  a 
detractor.  They  reported  that  an  important  component  in  the 
efficient  use  of  time  was  the  amount  of  time  spent  on 
management  and  determined  that  "good  teachers  are  nearly 
always  good  managers,  though  they  may  not  spend  much  time 
managing.  Indeed,  it  is  likely  that  the  worst  managers 
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spend  the  most  time  managing"  (p.  253).  They  encouraged  the 
use  of  management  systems  which  allow  students  to  manage 
themselves,  thus  freeing  the  teacher  to  act  in  the  more 
preferred  instructional  role,  a strategy  also  encouraged  by 
Siedentop  ( 1983 ) . 

In  a study  of  the  amount  of  class  time  used  by 
teacher  management,  Siedentop  (1976)  cited  Siedentop,  Rife, 
and  Boehm  in  stating  that  the  length  of  managerial  episodes 
in  a typical  physical  education  lesson  can  average  as  high 
as  2 minutes  to  3.25  minutes,  but  that,  with  training,  each 
episode  can  easily  be  reduced  to  less  than  30  seconds. 
Siedentop  (1983)  reported  on  a study  in  which  the  time  spent 
in  managerial  activity  was  reduced  from  over  10  minutes 
during  a 35-minute  period,  to  1 minute  or  less.  He 
suggested  the  following  strategies  to  reduce  instructional 
time  lost  to  managerial  activities: 

1.  Posting  the  first  activity  of  the  day,  where 
students  should  be  and  what  the  activity  will  be 

2.  Beginning  the  class  at  a definite  time 

3.  Spending  one  minute  or  less  taking  roll 

4.  Not  using  class  time  for  administrative  purposes 

5.  Teaching  students  a signal  for  attention,  and 
reacting  to  those  who  respond  quickly,  not 
slowly 

6.  Using  high  rates  of  feedback  and  positive 
interaction 

7.  Posting  records  of  time  spent  on  management 
(elementary  and  junior  high) 

8.  Use  enthusiasm,  hustle,  and  prompts  (hustle: 

teacher  behavior  that  energizes  student 
behavior;  prompt:  teacher  verbal  reminder  of 

appropriate  behavior)  (pp. 72-77) 
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Punctuality  in  beginning  a class  is  a very  important 
component  of  efficient  use  of  time  (Siedentop,  1983).  This, 
when  coupled  with  providing  activities  and  attending  to 
students,  should  result  in  little  instructional  time  being 
lost  by  students  waiting  for  teacher  assistance  (Graham  & 
Heimerer,  1981).  Graham  and  Heimerer  reported  that  the 
climate  of  the  learning  environment  should  be  one  that  is 
businesslike  and  task-oriented  and  yet  simultaneously  warm 
and  convivial.  They  stated  that  the  classes  of  an  effective 
teacher  begin  and  end  on  time,  and  an  effective  teacher 
systematically  organizes  the  classroom  so  that  there  is 
little  waiting  time  and  so  that  students  do  not  have  to 
depend  on  the  teacher  for  directions  or  materials  once  they 
begin  working. 

A major  factor  in  physical  education  instruction  is 
the  extensive  use  of  equipment  and  the  need  for  properly 
marked  field  spaces.  The  former  requires  the  teacher  to  set 
up  some  type  of  management  system  so  that  equipment  will  be 
at  the  site  of  the  class  and  distributed  quickly  to  avoid 
losing  academic  learning  time.  The  need  for  marked  field 
spaces  requires  teacher  planning  and  preparation  before  the 
class  begins  so  that  students  can  move  smoothly  into 
appropriate  involvement  with  the  skill/activity  of  the  day. 
Although  no  studies  in  the  literature  directly  related  to 
physical  education/motor  skills,  Rutter  et  al . (1979),  in  a 
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study  of  English  secondary  schools,  found  that  when  less 
time  was  spent  on  setting  up  equipment,  handing  out  papers, 
and  routine  tasks,  student  behavior  was  better. 

Hawkins,  Wiegand,  and  Landin  (1985)  investigated 
another  situation  that  has  an  effect  on  time-on-task:  the 

relative  degree  of  active  or  passive  supervision  by  the 
instructor.  They  determined  that  the  degree  of  supervision 
engaged  in  by  the  teacher  was  an  important  factor  in 
maintaining  task  attention.  Active  supervision  was  defined 
as  a pattern  of  overseeing  in  which  the  teacher  alternated 
between  the  proper  use  of  general  observation  (whole  class 
general  safety  check)  and  frequent,  brief  instances  of  more 
focused  observation  with  individuals. 

The  necessity  for  teacher  monitoring  was  supported 
by  Tousignant  and  Siedentop's  (1983)  study,  reported  more 
fully  in  the  Lesson  Development  portion  of  this  chapter. 
They  found  that  keeping  students  on-task,  although  affected 
by  the  type  of  task  assignment  which  was  given  (i.e., 
implicit,  generally  explicit,  specifically  explicit),  was 
also  dependent  upon  the  monitoring  behavior  of  the  teacher. 
Over  a period  of  time,  students  who  were  not  interested  in 
the  stated  task  (a)  became  bystanders,  or  (b)  modified  the 
task.  A pattern  which  was  typical  for  the  students  was 

-To  listen  to  the  teacher's  presentation  of  a task 
or  act  as  if  they  were  paying  attention 
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-At  first,  approach  novel  activities  with  vigor  and 
enthusiasm 

-Then,  rapidly  reduce  their  rate  of  responses  to 
this  solicitation  either  when  they  had  reached  an 
acceptable  response  according  to  their  criteria,  or 
when  they  realized  that  they  were  not  able  to  do  it 
right  away 

-Progressively,  some  students  drifted  toward  task 
modification;  meanwhile,  other  students  became 
bystanders  until  another  task  was  presented 

-If  the  activity  lasted,  students  engaged  in  more 
task  modifications  and  were  likely  to  fall  in  the 
deviant  behavior  zone 

-The  slow-down  in  the  rate  of  responses  added  to  the 
modifications  tended  to  encourage  the  teacher  to 
introduce  new  tasks  frequently,  independently  of 
the  students'  level  of  mastery  of  the  task  (p.  50) 

Most  studies  have  dealt  with  teacher  management  and 
organization  techniques  which  affect  those  student  time-on- 
task  behaviors  a teacher  can  plan  and  control.  There  are 
times,  however,  when  a teacher  cannot  control  disruptions  to 
the  teaching  process.  No  studies  were  found  in  the  physical 
education/motor  skills  literature,  but  Fredrick  (1977), 
interested  in  the  effect  of  disruptions  on  student 
achievement  in  reading,  conducted  a study  using  184  high 
school  classes.  Fredrick  defined  disruptions  as  anything 
that  contributed  to  off-task  time  by  stopping  the  progress 
of  the  lesson.  He  ascertained  that  high-achieving  secondary 
classes  showed  a positive  correlation  between  achievement 
and  a lower  frequency  of  the  number  of  disruptions.  In  the 
high-achieving  classes,  25%  of  the  available  on-task  time 


29 


was  wasted  by  interruptions,  while  as  much  as  49%  of  the 
task  time  was  lost  by  low-achieving  classes. 

In  summarizing  the  findings  related  to  efficient  use 
of  time  in  instruction,  it  appears  clear  that  keeping 
students  involved  in  classwork  should  have  a positive  effect 
on  their  achievement.  There  are  a variety  of  strategies 
which  can  be  used  to  reduce  instructional  time  lost  to 
activities  of  a managerial  nature.  These  strategies  include 
such  teacher  behaviors  as  starting  class  on  time,  using 
techniques  for  handling  equipment  efficiently,  taking  roll 
quickly,  using  enthusiasm  and  prompts,  establishing 
procedures  for  handling  disruptions,  routinizing  many 
repetitive  activities,  and  using  management  games. 

Siedentop  (1983)  pointed  out  that  the  amount  of  time  spent 
in  managerial  activities  can  be  reduced  by  as  much  as  90%. 

A teacher  should  be  businesslike  and  task-oriented,  but  not 
at  the  risk  of  losing  a warm  and  convivial  relationship  with 
the  students.  Another  situation  which  has  a direct  effect 
on  time-on-task  is  the  degree  of  supervision  by  the 
instructor  and  whether  the  supervision  is  active  or 
passive . 

PRINCIPLE:  If  the  teacher  is 

efficient  in  the  use  of  instructional 
time  by  beginning  and  ending  class  on 
time,  organizing  so  as  to  spend  limited 
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time  on  administrative  duties,  keeping 
students  involved  in  content  activities, 
and  monitoring  class  activities, 
achievement  should  increase. 

Lesson  Development 

When  beginning  a lesson,  telling  students  the 
general  framework  of  the  lesson  that  is  to  follow  or  giving 
them  the  main  ideas  or  objectives  of  the  lesson  has  been 
said  to  facilitate  learning  (Florida  DOE,  1983).  Although 
research  findings  are  inconsistent  regarding  the  specific 
nature  and  use  of  objectives,  those  studies  which  have  found 
such  an  effect  have  usually  favored  the  presentation  of 
objectives  (Duchastel  & Merrill,  1973).  Duchastel  and 
Merrill  pointed  out  the  complexity  of  the  issue  due  to  the 
"wide  array  of  variables  involved"  (p.  63).  The  difficulty 
in  clarifying  the  type  of  objective  and  manner  of 
presentation  is  that  there  is  no  agreed-upon  definition  of 
objectives  which  is  generalizable  across  studies  (Florida 
DOE,  1983). 

In  the  physical  education  literature,  Hollingsworth 
(1975)  cited  Mace,  who  suggested  that  working  toward  a 
specific  goal  would  lead  to  a higher  level  of  performance 
than  would  working  toward  an  abstract  goal,  such  as  "Do  your 
best"  (p.  64).  Hollingsworth  designed  a study  to  evaluate 
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the  effects  of  specific  performance  goals  and  verbal 
encouragement  on  the  learning  of  motor  tasks.  The  study 
compared  the  performance  of  junior  high  students  on  a gross 
motor  juggling  skill  under  two  conditionss,  verbal 
encouragement  (VE)  and  performance  goals  (PG).  Subjects  in 
the  VE  group  were  given  verbal  encouragement  ( "Do  your 
best,"  "Try  harder,"  "You  can  do  better")  preceding  and 
during  the  practice  sessions;  subjects  in  the  PG  group  were 
given  a specific  goal  to  work  toward  based  on  the  previous 
day's  results.  The  goal  was  always  one  whole  number  above 
the  average  number  of  catches  made  during  the  previous 
session.  Both  experimental  groups  and  the  control  group 
received  information  about  the  number  of  catches  they  made 
during  the  previous  session  and  their  average  score. 

Although  she  found  no  significant  difference  during  the 
performance  of  the  three  groups,  the  mean  number  of  catches 
for  the  PG  group  exceeded  that  of  the  VE  group  and  both 
exceeded  the  performance  of  the  control  group  (mean  number 
of  catches  5.00,  4.32,  and  3.72,  respectively).  It  appears 
possible  that  Hollingsworth  found  no  significant  differences 
in  performance,  because  all  three  groups  received  feedback 
about  their  performances.  Subjects  with  knowledge  of  their 
scores  can  set  goals  for  themselves.  Some  subjects  may  have 
already  set  incremental  goals  for  themselves,  thus  providing 
all  groups  with  the  treatment  effect  of  setting  specific 
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goals.  Hollingsworth  also  stated  that  the  goals  given  to 
the  subjects  may  not  have  been  difficult  enough. 

Malmud  (1974)  was  interested  in  the  effect  of 
knowledge  of  specific  standards  or  goals  on  student  grade 
achievement  and  designed  a study  using  skills  in  archery  as 
the  dependent  variable.  She  determined  that  knowledge  of 
specific  performance  standards  did  result  in  improved  motor 
performance  although  her  analysis  of  data  revealed  a 
significant  improvement  only  for  males  in  the  sample 
population.  Although  females  showed  some  gain  after  the 
treatment,  the  gain  was  not  statistically  significant,  a 
fact  she  attributed  to  setting  performance  standards  too  low 
to  have  sufficiently  motivated  the  subjects.  Dey  and  Maur 
(1965)  found  that  difficult,  but  attainable,  goals  produced 
a higher  level  of  performance  than  did  easy  goals.  In  their 
study  they  investigated  the  relation  of  performance  to  ego 
motivation.  Subjects  were  given  the  task  of  canceling  the 
a's,  e's,  and  i's  which  appeared  in  lengthy,  random  series 
of  letters  of  the  English  alphabet.  After  establishment  of 
their  pretest  average,  the  subjects  were  given  an  artificial 
goal  purported  to  be  either  the  average  of  their  classmates 
or  the  average  of  all  college  students  throughout  the  world. 
Subjects  assigned  a goal  20*  higher  than  their  pretest 
average  improved  24*,  the  improvement  on  goals  set  40* 
higher  was  27*,  60*  goal  improvement  was  33*,  80*  goal 
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improvement  was  36$,  and  100%  goal  improvement  was  31%.  Dey 
and  Maur  concluded  from  the  data  that  the  higher  the  level 
of  difficulty  of  the  goal,  the  better  the  performance  to  a 
certain  optimal  limit.  After  this  limit,  performance 
improved,  but  at  a lower  percentage.  It  appears,  therefore, 
that  the  use  of  objectives,  perhaps  in  the  form  of 
performance  standards,  can  increase  student  achievement; 
however,  the  standards  have  to  be  set  high  enough  to 
motivate  the  students  to  achieve,  but  low  enough  not  to 
discourage  them.  Students  must  also  be  aware  of  the 
standards  and  goals  before  instruction  begins. 

Several  researchers  already  reported  (Graham  & 
Heimerer,  1981;  Hollingsworth,  1975;  Mace,  cited  by 
Hollingsworth,  1975;  Zaichkowsky,  Zaichkowsky,  & Martinek, 
1978)  pointed  out  the  value  of  a specific  rather  than  a 
general  goal  statement.  Graham  and  Heimerer,  after  a survey 
of  the  literature,  concluded  that  students  should  know 
exactly  what  is  expected  of  them  and  that  the  goals  of 
instruction  must  be  clear.  Siedentop  (1983),  drawing  on 
research  findings,  suggested  that  the  initial  presentation 
in  a class  should  follow  these  principles: 

1 . Instructional  tasks  should  be  clearly 
identified . 

2.  Importance  of  tasks  should  be  clearly 
established . 

3.  Students  should  understand  specifically  what 
they  are  to  achieve. 
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4.  New  material  should  be  related  to  previous 
experience . 

5.  Students  should  have  an  opportunity  to  seek 
clarification. 

6.  Criteria  for  evaluation  should  be  specified. 

7.  Instructional  aids  should  be  ready  ahead  of  time 
and  used  simultaneously  with  verbal 
presentations.  (p.  193) 

One  of  the  purposes  of  stating  the  objectives  or 
goals  of  a class  or  practice  session  is  to  increase  on-task 
student  behavior.  Although  keeping  students  on-task  is 
partially  dependent  upon  the  monitoring  behavior  of  the 
teacher,  Tousignant  and  Siedentop  (1983)  conducted  a study 
of  teacher  accountability  systems  related  to  on-task  student 
behavior  using  implicit  and  explicit  tasks  as  variables. 

The  tasks  were  defined  as  follows: 

1.  An  implicit  task:  The  task  presentation  was 

done  with  no  or  very  limited  information;  in 
such  circumstances,  students  had  to  know  from 
previous  experiences  how  to  play  the  roles  of  a 
participant  in  such  tasks  (e.g.,  "Today  we  play 
volleyball,"  or  "Squad  four,  you  go  to  the 
parallel  bars"). 

2.  A generally  explicit  task:  The  task 

presentation  included  a general  description  of 
the  form  or  the  product  of  an  expected  response 
(e.g.,  "We  will  run  a mile;  try  to  pace 
yourself;  you  should  not  stop  moving"). 

3.  A specifically  explicit  task:  The  task 

definition  included  precise  criteria  to  be  used 
to  determine  the  level  of  success  (e.g.,  "You 
need  to  hit  the  target  five  times  out  of  ten." 

"More  than  five  unexcused  absences  will  get  you 
an  F grade" ) . (p . 53 ) 

The  data  revealed  that  explicit  tasks  typically  led  to  a 
high  rate  of  on-stated-task  behavior  while  implicit  task 
presentations  were  associated  with  a good  deal  of  task 
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modification  since  almost  any  related  responses  were 
accepted.  The  researchers  emphasized  the  use  of 
specifically  explicit  task  statements  in  order  to  elicit  a 
high  rate  of  on-stated-task  behavior. 

Another  method  of  communicating  goals  was  reported 
by  Siedentop  (1983),  who  stated  that  the  first  thing  that 
should  be  included  in  a good  lesson  plan  is  a description  of 
the  major  objectives  or  goals  for  the  day's  activities.  The 
lesson  plan  should  then  show  clearly  how  the  activities  are 
to  be  developed  so  that  they  reach  the  intended  goals  and  so 
that  the  interrelationships  among  the  activities  are 
evident.  These  goals  should  then  be  clearly  communicated  to 
the  students  "verbally  . . . or  . . . ahead  of  time  on  task 

cards  or  other  such  devices"  (p.  184). 

The  value  of  review  appears  to  be  well  substantiated 
by  studies  from  effective  teaching  literature  and  from 
professional  experience.  Wright  and  Nuthall  (1970),  in  a 
study  using  elementary  science  students,  reported  a +0.633 
correlation  (p  < .10)  with  mean  class  residual  scores  on  an 
achievement  test  of  elementary  science  knowledge  for  those 
teachers  who  used  a revision  (review)  at  the  end  of  class. 
Good  and  Grouws  (1979),  in  testing  the  effects  of  an 
experimental  treatment  on  fourth-grade  mathematics  classes, 
found  that  key  variables  separating  effective  and 
ineffective  teachers  (as  determined  by  achievement  records) 
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included  a review  at  the  beginning  of  new  lessons  and  weekly 
and  monthly  reviews.  In  their  study  of  the  literature  on 
this  topic,  the  Florida  Department  of  Education  researchers 
(1983)  concluded  that,  "If  reviews  are  conducted  at  the  end 
of  the  lesson  and  at  weekly  intervals  (or  occasionally 
longer  ones),  then  retention  as  well  as  the  amount  of 
learning  will  be  increased"  (p.  69) . 

Information  on  the  effectiveness  of  review  in 
physical  education  is  sparse.  Only  Graham  and  Heimerer 
(1981),  in  listing  instructional  techniques  that  could  be 
effective  in  physical  education,  stated  that  there  should  be 
continuity  within  and  between  lessons  and  promoted  the  use 
of  review  as  a teaching  behavior  to  link  lessons  together. 
Their  statement  came  as  a result  of  a survey  of  the  body  of 
effective  teaching  process-product  research  and  not  from  any 
studies  in  physical  education/motor  skills. 

A variety  of  studies  have  indicated  that  student 

achievement  is  facilitated  by  teacher  presentation  of 

instruction  (Good  & Grouws,  1977;  Medley,  1977,  1979; 

Rosenshine  & Furst,  1973).  Siedentop  (1983)  stated  that 

A lecture  or  demonstration  can  be  used  to  introduce 
a new  activity,  to  show  various  strategies  in  an 
activity,  to  teach  higher  level  skills,  or  for 
motivational  purposes  ...  a well-planned  lecture 
or  demonstration  can  be  enormously  effective. 

(p.  180) 
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Lawther  (1977)  discussed  the  value  of  verbalization 

ascertained  from  his  survey  of  the  research  and  stated  that 

oral  guidance  by  the  teacher  could  accelerate  the  learning 

of  motor  skills,  particularly  if  some  knowledge  is  involved 

in  the  skill.  Examples  include  times  when  cognitive 

knowledge  of  the  proper  form  and  performance  of  a skill  is 

necessary  to  answer  questions  on  a written  test,  skill 

sequences  occur  that  are  not  highly  integrated,  and 

attention  must  be  focused  on  a portion  of  a skill  to  help 

the  student  at  a more  advanced  level  improve  performance. 

Lawther  warned  that  "when  talk  interrupts  needed  practice 

time,  it  may  actually  be  a handicap  to  the  learner" 

(p.  104).  We  see  that  oral  instruction  by  the  teacher  can 

assist  the  learning  process,  but  that  it  should  not  be 

presented  in  such  a manner  as  to  shorten  the  time  allocated 

for  practice.  Lawther  was  more  positive  about  the  value  of 

teacher  verbalization  for  the  advanced  performer  who  has 

developed,  to  some  extent,  a skill  vocabulary. 

Verbal  descriptions  or  explanations,  by  a teacher 
. . . may  be  meaningful  and  helpful  . . . [since] 

highly  skilled  persons  often  have  verbal  symbols 
attached  to  the  simpler  units  or  aspects  of  motor 
patterns;  hence  they  can  profit  from  verbal 
direction  emphasizing  cues  for  readjustments  and 
refinements.  (p.  125) 

Other  observable  behaviors  related  to  the  teacher's  talking 
or  performing  on  the  subject  matter  will  be  found  in  the 
categories  of  demonstrations  and  feedback. 
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Hawkins,  Wiegand,  and  Landin  (1985)  confirmed 
Lawther's  (1977)  conclusion  regarding  verbal  motivation  and 
practice  time.  They  concluded  that  an  effective  strategy 
for  teacher  talk  was  to  break  up  instructional  sessions  with 
more  opportunities  for  motor  responses,  thus  "limiting  the 
amount  of  verbal  information  that  students  had  to  process" 
(p.  252).  They  also  suggested  that  changing  to  more  visual, 
task-oriented  cognitive  activities  at  learning  stations 
would  enhance  the  students'  information  processing,  and  that 
a minimum  amount  of  time  should  be  used  to  maintain  an 
effective  instructional  system — although  some  time  must  be 
spent  on  transition  tasks  (moving  from  one  activity  or 
setting  to  another).  Part  of  excessive  time  involvement 
with  transition  tasks  may  be  associated  with  student 
confusion  regarding  how  to  use  an  instructional  system. 
"Excessive  time  spent  reading  task  cards  and  asking 
questions  to  clarify  tasks  indicates  that  the  instructional 
systems  are  either  too  complicated  or  are  not  taught  well" 

(p.  251).  Hawkins,  Wiegand,  and  Landin  also  suggested  that 
some  students  may  spend  excessive  amounts  of  time  in 
transition  activities  so  they  can  avoid  task  participation. 
Students  may,  in  this  way,  maintain  their  status  as  members 
in  good  standing  (Tousignant  & Siedentop,  1983)  without 
engaging  in  the  subject  matter. 
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The  effect  of  questioning  for  the  purpose  of 
orienting  students  or  to  assess  their  understanding  of  the 
subject  matter  under  discussion  has  seldom  been  explored  in 
the  physical  education  setting.  Graham  and  Heimerer  (1981), 
in  their  review  of  the  teacher  effectiveness  process-product 
research,  listed  teacher  questioning  as  a behavior  that 
should  prove  effective  for  physical  education  teachers. 

They  suggested  that  teacher  questions  should  be  narrow, 
direct,  single-faceted,  and  structured  to  enable  students  to 
understand  the  answer  sought  by  the  teacher.  The  use  of 
questions  was  also  supported  by  Siedentop  (1983)  who  stated 
that  questioning  is  one  of  the  most  important  aspects  of  a 
teacher's  verbal  methods.  He  felt  that  questions  are  best 
understood  as  an  intentional  part  of  an  instructional 
objective. 

A question  is  a way  of  stating  the  task  and  setting 
the  conditions  under  which  it  will  be  performed. 

Some  questions  imply  the  criteria  by  which  the  task 
will  be  judged,  while  others  purposely  leave  the 
criteria  open.  (p.  204) 

Siedentop  emphasized  that  questions  must  be  clear  and 
precise,  like  any  instructional  objective.  He  suggested 
that  questions  fall  into  one  of  four  categories  and  that 
questions  from  each  category  are  used  for  different 
purposes . 

1.  Recall  questions  require  a memory-level  answer. 

2.  Convergent  questions  call  for  analysis  and 
integration  of  previously  encountered  material. 
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3.  Divergent  questions  call  for  solutions  to 
previously  unencountered  material.  . . . Many 
different  answers  may  be  correct. 

4.  Value  questions  call  for  expressions  of  choice, 
attitude,  and  opinion.  Answers  are  not  judged  as 
right  or  wrong.  (p.  204) 

Anshel  and  Singer  (1980)  cited  Anshel's  study  in 
which  he  pointed  out  the  value  of  using  questions  in  a 
slightly  different  manner.  Anshel  divided  his  subjects  into 
four  groups  assigning  each  group  one  of  four  specific 
strategies  (imagery,  directed  attention,  rhythmic 
verbalization,  and  paraphrasing)  to  use  to  learn  the  gross 
motor  skill  of  juggling.  The  directed  attention  strategy 
required  the  use  of  what  Andre  (1979)  called  adjunct 
questioning.  A sample  adjunct  question  for  the  students  in 
the  directed  attention  group  was,  "Where  should  your  eyes  be 
focused  during  the  task?"  (p.  452).  He  found  that  subjects 
who  used  any  of  the  strategies  demonstrated  significantly 
better  performance  when  compared  to  nonusers  of  strategies 
and  that  the  directed  attention  group  (adjunct  questioning) 
performed  significantly  better  on  a complex  gross  motor 
skill  than  did  the  no-strategies  control  group,  on  both 
acquisition  and  retention  tests.  The  adjunct  questions 
apparently  worked  as  a verbal  cue  to  direct  the  student 
toward  an  appropriate  motor  response. 

After  a question  has  been  stated,  there  are  certain 
teacher  behaviors  which  can  influence  the  quality  of  the 
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student  response.  The  effectiveness  of  a wait-time , after 
asking  a student  a complex  question  and  before  soliciting  a 
response,  was  well  documented  by  Rowe  (1974)  in  her  study  of 
science  classrooms.  In  analyzing  900  audiotapes,  she 
established  that  the  length  of  time  the  teacher  paused  for  a 
pupil  response  influenced  the  response.  Rowe  found  that 
student  responses  were  longer,  fewer  students  failed  to 
respond,  and  there  were  increased  contributions  by  slower 
students  when  teachers  implemented  a 3-  to  5-second  wait 
time  in  their  questioning  pattern.  Anderson,  Evertson,  and 
Brophy  (1979),  in  a study  with  first-grade  reading  groups, 
showed  that  the  rate  per  minute  of  choral  or  unison 
responses  was  negatively  related  to  achievement.  As  the 
number  of  unison  responses  increased,  achievement  decreased. 
Although  the  researchers  cautioned  against  wide 
generalization  of  the  results  without  further  study,  the 
concept  of  questioning  to  assess  student  knowledge  would 
appear  to  be  only  partially  enhanced  by  permitting  the 
students  to  answer  in  unison.  In  a unison  response 
students'  wrong  answers,  or  nonparticipation  might  not  be 
noticed,  and  the  teacher  might  never  become  aware  of 
students  who  are  not  comprehending  instruction. 

In  summary,  telling  students  the  goals  and 
objectives  of  a lesson  that  is  to  follow  should  lead  to  a 
higher  level  of  performance  than  when  not  using  such  a 
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statement.  Even  when  combined  with  verbal  encouragement, 
the  use  of  performance  goals  was  found  to  be  more  effective 
in  raising  student  achievement.  It  was  pointed  out  that 
goals  should  be  difficult,  but  attainable.  The  specificity 
of  a goal  is  also  important:  Students  should  know  exactly 

what  the  teacher  expects  of  them  during  the  ensuing  lesson. 
The  relationship  of  goal  setting  to  keeping  students  on-task 
was  established.  Explicit  task  directions  generally  led  to 
a high  rate  of  on-task  behavior,  while  instructions  which 
left  the  task  implicit  often  led  to  performances  which  were 
modified  from  the  teacher's  intentions.  Keeping  students 
on-task  was  also  dependent  upon  the  intensity  of  teacher 
monitoring . 

Reviewing  previously  learned  skills  and  knowledge 
and  reviews  at  the  end  of  a class,  at  the  end  of  the  week, 
or  later  intervals,  have  been  shown  to  be  effective  in 
increasing  a student's  learning  and  retention.  Oral 
instruction,  by  the  teacher,  has  also  proven  effective  in 
facilitating  the  learning  of  skills  when  it  was  at  the 
students'  level  of  understanding,  was  used  to  focus 
attention  on  portions  of  the  skill,  or  when  it  was  used  to 
clarify  points  of  form.  Oral  instruction,  however,  should 
not  eliminate  or  restrict  practice  or  it  may  actually  hinder 
learning.  It  was  reported  that  it  would  be  effective  to 
alternate  the  use  of  oral  instruction  and  practice  sessions 
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in  order  to  allow  students  to  concentrate  on  assimilating  a 
smaller  amount  of  verbal  information  each  time. 

Use  of  single-faceted  questions  by  the  teacher  has 
been  found  to  be  effective.  In  physical  education,  the  use 
of  questioning  has  been  suggested  to  increase  the 
effectiveness  of  demonstrations.  Questions  have  also  been 
effective  in  drawing  students'  attention  to  components  of  a 
skill  that  they  may  be  performing  poorly.  The  use  of  wait- 
time after  asking  a student  a complex  question  has  also  been 
shown  to  aid  in  student  learning  and  to  increase 
contributions  by  slower  students.  Allowing  students  to  call 
out  answers  was  negatively  related  to  achievement. 

PRINCIPLE:  Student  achievement 

should  increase  if  the  teacher  informs 
students  of  the  goals  and  objectives  of 
the  class;  provides  continuity  within 
and  between  lessons;  conducts  reviews; 
provides  oral  instruction  for  the 
purpose  of  focusing  attention;  questions 
student  comprehension  by  means  of 
direct,  single-faceted  questions;  pauses 
before  calling  for  a response  on  complex 
questions;  and  limits  the  use  of  unison 


response . 
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Demonstrations 

The  use  of  demonstrations/modeling  behavior  has  long 
been  accepted  by  educators  as  an  effective  teaching 
strategy.  There  is  a great  deal  of  research  for  this  type 
of  learning  from  the  observational  learning  literature,  but 
it  has  usually  dealt  more  specifically  with  verbal  and 
social  behaviors  (Bandura,  1969).  Physical  educators  are 
cognizant  of  the  limitations  of  language  in  describing 
complex  movements;  thus,  they  rely  heavily  on  demonstration 
as  a means  of  communicating  motor  patterns.  Researchers 
such  as  Lawther  (1977)  and  McAuley  (1982)  emphasized  the 
effectiveness  of  using  demonstrations  in  learning  specified 
physical  education  skills  as  compared  with  control  groups 
which  did  not  have  demonstrations.  Feltz  and  Landers  (1977) 
compared  the  effects  of  using  demonstrations  and 
motivational  statements  as  part  of  instruction  on  a motor 
task.  The  instruction  was  presented  in  four  ways: 

1.  Two  demonstrations  with  a motivational 
statement,  "I  can  climb  to  the  5th  rung,  let's 
see  you  do  it." 

2.  Two  demonstrations  with  no  statement. 

3.  Use  of  the  motivational  statement  (above),  but 
no  demonstration. 

4.  No  demonstration  or  statement.  (p.  528) 

The  results  indicated  that  using  the  demonstration 
only  was  significant  (p  < .05).  Students  who  received  a 
demonstration  had  higher  performance  scores  (M  = 2.84)  than 
students  not  given  a model  demonstration  (M  = 2.41).  Data 
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on  the  value  of  motivational  statements  indicated  that  the 
statements  had  little  influence  on  performance  except  when 
combined  with  modeling.  Groups  who  received  the  treatment 
combining  modeling  and  motivational  comments  scored 
significantly  higher  than  all  other  treatments  (p  < .05). 
Thus,  it  appears  that  the  use  of  demonstrations  is  very 
effective  in  the  learning  of  motor  skills  and  that  the  value 
of  demonstrations  can  be  significantly  increased  when 
combined  with  motivational  statements.  This,  Feltz  and 
Landers  stated,  was  in  agreement  with  previous  modeling 
literature . 

It  is  not  just  observing  a demonstration  which 
results  in  greater  achievement.  The  demonstration,  to  be 
effective,  must  be  correct.  Martens,  Burwitz,  and  Zuckerman 
(1976)  used  boys  in  grades  2,  3,  7,  and  8 in  a study  of 
modeling  effects  on  motor  performance.  They  found  that 
subjects  who  observed  the  assigned  task  performed  correctly 
or  who  observed  progressive  improvement  on  the  task 
demonstrated  significantly  better  performance  (p  < .04)  than 
those  who  had  observed  an  incorrect  model  or  had  no 
demonstration  at  all. 

Interestingly,  the  performance  differences  in 
Martens,  Burwitz,  and  Zuckerman 's  (1976)  study  were  not 
observed  after  the  10th  trial,  suggesting  that  practice  may 
have  equalized  the  initial  benefit  of  observing  the  model. 
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They  reported  that  this  initial  period  has  been  referred  to 
as  the  cognitive  phase  of  motor  skill  acquisition,  the 
period  of  time  when  the  learner  must  make  the  proper  mental 
conversion  between  what  was  observed  and  what  must  be 
performed.  Bandura  (1971)  suggested  that  persons  may  learn 
what  to  do  through  a demonstration,  but  if  they  do  not  have 
the  motor  capacity  to  make  the  response  required  for  correct 
performance,  they  will  be  incapable  of  demonstrating  what 
has  been  learned.  Conversely,  a motor  skill  may  have  rather 
simple  cognitive  and  motor  demands  for  achieving  a 
moderately  good  performance  level  which  can  be  acquired  in  a 
short  amount  of  time.  Refinement  of  the  skill  may  not 
require  more  information  (cognitive  component),  but  simply 
practice.  Martens,  Burwitz,  and  Zuckerman  concluded  that 
continued  teacher  demonstration  of  a skill  may  not  be  as 
effective  as  student  practice. 

Is  there  an  optimal  time,  in  the  process  of  learning 
a motor  skill,  to  present  a demonstration?  Thomas,  Pierce, 
and  Ridsdale  (1977)  compared  the  effects  of  demonstrations 
presented  before  and  half-way  into  the  learning  and  no 
demonstrations.  The  subjects  were  7-  and  9-year-olds 
(second  and  fourth  graders)  who  were  to  learn  a balancing 
task.  Findings  indicated  that  a demonstration  at  the 
beginning  significantly  (p  < .05)  aided  the  learning  of  both 
age  groups.  The  midway  model  had  a detrimental  effect 
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(£  < .05)  on  the  performance  of  younger  children,  but  helped 
the  older  children.  The  performance  of  the  group  that  had 
no  demonstration  was  significantly  less  effective  (p  < .05) 
for  both  groups.  There  was  also  a significant  effect 
(p  < .05)  for  age;  overall,  the  9-year-olds,  regardless  of 
modeling  condition,  balanced  longer  than  the  7-year-olds. 

The  researchers  interpreted  these  findings  in  the  light  of  a 
likely  increase  in  both  the  cognitive  processing  capacity 
and  the  larger  repertoire  of  motor  abilities  of  the  older 
children.  Once  a young  child  begins  to  acquire  a learning 
strategy,  the  introduction  of  a model  seems  to  interfere 
with  learning.  It  would  appear  that  younger  children 
develop  their  strategies  on  a trial-and-error  basis,  and  the 
addition  of  cues  from  a model  to  their  own  strategies  may 
result  in  more  information  than  they  can  process.  The 
researchers  pointed  out  that  if  the  cues  from  a model  are 
available  at  the  beginning  of  practice,  the  children  could 
use  those  cues  to  develop  a learning  strategy  rather  than 
follow  a trial-and-error  process. 

Few  studies  have  been  conducted  to  determine  if 
there  were  an  optimum  number  of  demonstrations  for  greatest 
effectiveness.  Feltz  (1982),  in  a survey  of  the  literature, 
reported  that  previous  research  indicated  that  modeling 
effectiveness  is  enhanced  with  repetitive  demonstrations, 
but  that  the  specific  number  of  demonstrations  needed  for 
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optimal  modeling  effects  had  not  been  determined.  To 
investigate  the  effects  of  age  and  number  of  demonstrations 
on  both  form  and  performance,  Feltz  designed  a study  using 
both  college-age  and  elementary-age  participants  performing 
a ladder-balance  task.  The  control  group  in  each  age 
bracket  received  no  demonstrations;  other  groups  received  4, 
8,  or  12  demonstrations.  Subjects'  performance  was  measured 
in  two  ways:  (a)  by  counting  the  number  of  rungs  climbed 

each  time  (physical  performance)  and  (b)  from  judges' 
ratings  of  students'  imitation  of  the  modeled  performance 
and  strategy  (form).  The  five  components  that  were  judged 
under  form  were  hands  on  the  side  of  the  ladder,  hands 
moving  up  the  ladder,  arms  extended  in  front  at  the  start, 
pulling  the  ladder  towards  oneself,  and  climbing  quickly. 
Feltz  obtained  a significant  main  effect  for  age;  college- 
age  students  had  higher  performance  and  form  scores  than 
elementary-age  students  (p  < .001).  The  only  significant 
difference  between  the  four  demonstration  treatments  was 
discovered  in  the  form  measurement.  Students  who  received 
12  demonstrations  scored  better  on  form  (p  < .01)  than  the 
control  group  and  the  12  demonstration  groups  scored  better 
(p  < .0005)  on  quickness  than  those  receiving  four 
demonstrations  or  the  control  group.  Feltz  had  hypothesized 
that  college-age  students  would  require  fewer  demonstrations 
to  produce  modeling  effects  than  elementary-age  students, 
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but  her  hypothesis  was  not  supported.  Feltz  concluded  that 
the  number  of  demonstrations  needed  to  create  positive 
modeling  effects  is  specific  to  the  task  and  dependent  on 
the  number  of  practice  trials. 

The  environment  in  which  a demonstration  is 
presented  appears  to  be  a factor  in  its  effectiveness.  In  a 
meta-analysis  of  the  effects  of  feedback  in  the  teaching  of 
motor  skills,  Rothstein  (1982)  found  that  the  nature  of  the 
environment  can  affect  the  performer's  ability  to  perceive 
and  process  information  during  a demonstration.  Variables 
which  can  affect  the  acquisition  of  information  are  as 
follows:  the  number  of  important  input  items,  the 

prominence  of  the  important  input,  and  the  ratio  of 
important  to  unimportant  input.  Rothstein  determined  that 
the  performance  environment  must  be  manipulated  for  the  most 
effective  learning  to  take  place.  This  finding  supported 
that  of  Yando,  Zeitz,  and  Zigler  (1978),  who  stated  that 
children  may  attend  to  as  many  task-irrelevant  as  -relevant 
cues  and  recommended  that  their  attention  be  directed  to  the 
important  features  of  a model's  actions. 

Weiss  (1983)  was  also  concerned  with  the  effect  of 
students'  age  upon  the  effectiveness  of  a demonstration. 

She  concluded  that  environmental  factors  in  a demonstration 
seemed  to  have  a greater  effect  upon  the  learning  of  younger 
students  than  that  of  older  ones.  Weiss,  focusing  on  the 


50 


role  age-related  or  developmental  factors  play  in  the 
modeling  process,  based  her  conclusions  on  recent  research 
in  developmental  differences.  She  stated  that  "the  ability 
to  attend  selectively  to  the  appropriate  cues  necessary  for 
successful  performance  and  to  ignore  irrelevant  cues  does 
not  stabilize  until  about  11  years  of  age"  (p.  49) 
Specifically,  the  ages  5 to  7 years  show  a developmental 
increase  in  cognitive  and  perceptual  organization  of  young 
children.  Children  under  five  years  of  age  are  distracted 
easily  and  tend  to  focus  on  as  many  task-irrelevant  as  task- 
relevant model  cues.  Because  of  this,  young  children 
process  visual  information  more  slowly  than  older  children 
and  adults.  A summary  of  Weiss's  (1982)  suggestions  for 
minimizing  distractions  are  as  follows: 

1.  Face  children  away  from  the  sun  or  other 
activities . 

2.  Point  out  the  appropriate  task-relevant 
behaviors . 

3.  Use  labels  that  are  interesting  to  the  children 
for  important  points.  Use  the  labels  as  verbal 

instructional  cues.  Encourage  children  to 
think  aloud  using  these  labels. 

4.  Talk  to  the  children  as  the  skill  is 
demonstrated. 

5.  Make  sure  the  demonstrator  is  enthusiastic, 
competent,  has  a clear  voice,  and  maintains 
continuous  eye  contact. 

6.  Don't  make  the  task  too  complicated;  match  it  to 
the  skill  level  of  the  child.  (p.  49) 

Weiss  also  recommended  that  teachers  ensure  that  the 

appropriate  lead-up  skills,  necessary  for  imitating  the 

skill  sequences  demonstrated,  have  been  acquired. 
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Oral  instruction  is  an  important  component  in  a 
demonstration.  Weiss  (1983)  determined  that  the  traditional 
model  for  demonstrations,  that  of  presenting  new  skills  to 
children  beginning  with  an  oral  explanation  of  the  task 
followed  by  a visual  demonstration,  is  not  as  effective  as 
the  use  of  concurrent  verbal  cues  directing  children's 
attention  to  task-relevant  skill  components.  She  examined 
the  effects  of  age  modeling  and  verbal  self-instruction  on 
children's  performance  of  a sequential  motor  task.  Group 
erences  on  the  following  performance  measures  were 
compared:  (a)  number  of  trials  required,  (b)  percentage  of 

^■r^a^s  correctly  executed,  and  (c)  average  number  of  skill 
parts  correctly  performed  per  trial.  In  addition,  the 
percentage  of  inattentive  behaviors  was  analyzed  because 
attention  was  hypothesized  to  be  related  to  and  influenced 
by  motor  performance.  Significant  results  (p  < .001)  were 
obtained  for  age  and  model  type  main  effects.  Specifically, 
7-  to  8-year-old  children  required  significantly  fewer 
trials  (M  = 6.21)  than  their  younger  ( 4-and  5-year-old) 
counterparts  (M  = 7.76);  they  performed  more  correct  skill 
parts  per  trial  (M  = 4.91,  M = 3.81)  and  completed  a greater 
percentage  of  correct  trials  (M  = 42.5%,  M = 18.4%). 

Moreover,  older  children  exhibited  significantly  less 
inattentive  behaviors  (M  = 10.4%)  than  the  younger  children 
(M  = 18.4%).  Weiss  commented  that  the  significant  age 


52 


differences  in  performance  and  in  verbal-cognitive  abilities 
suggest  that  "physical  and  cognitive  capacities  of  children 
affect  their  motor  skill  learning,  and  that  these  capacities 
require  different  instructional  strategies"  (p.  195);  that 
children's  physical  abilities  are  related  to  their  cognitive 
levels;  and  that  imitation  is  increased  through  age  due  to 
the  advance  in  cognitive  skill.  She  concluded  that  modeling 
could  be  facilitated  with  elementary  children  by  using 
concurrent  verbal  cues  while  demonstrating  a task.  This 
conclusion  was  also  supported  by  Feltz's  (1982)  research. 

Siedentop  (1983),  drawing  upon  his  research, 
emphasized  the  value  of  oral  instruction  in  the  form  of 
lectures  and  demonstrations  to  introduce  something  new,  to 
show  game  or  activity  strategies,  to  teach  higher  level 
skills,  or  for  motivational  purposes.  He  cautioned  teachers 
to  use  vocabulary  which  was  appropriate  for  the 
developmental  level  of  the  students  during  an  oral 
presentation . 

The  language  must  be  appropriate  not  only  for  the 
age  level  of  the  students,  but  also  for  their  skill 
development  level . Every  sport  and  movement 
activity  has  a technical  skill  language  of  its 
and  it  is  highly  inappropriate  to  assume  that 
students  know  that  language  as  well  as  you  do. 

(pp.  180-181) 


own 
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Siedentop  recommended  a pace  of  110  words  per  minute  and  the 
use  of  visual  aids  to  enhance  student  interest,  clarify 
points,  and  focus  attention  upon  pertinent  points. 

Lawther  (1977)  cautioned  against  the  possible 
overuse  of  verbal  explanation  and  description  in  dealing 
with  some  types  of  learners.  Although  he  agreed  that 
demonstrations  have  been  a successful  instructional 
technique  for  the  skilled  and  average  learner,  he  cautioned 
that,  in  dealing  with  low-skill  learners,  the  very  young,  or 
the  very  old,  "demonstration  by  itself  is  unlikely  to  be 
completely  adequate  . . . [and  use  of]  verbal  explanation 

and  description  . . . appears  to  have  been  greatly 

exaggerated"  (p.  93).  In  these  situations,  he  suggested 
that  the  teacher  should  introduce  the  skills  in  very  simple 
units,  perhaps  helping  the  student  perform  the  skill  during 
early  efforts,  thus  "etching"  (p.  93)  the  gross  pattern  in 
the  minds  of  these  special  students. 

Several  researchers  have  pointed  out  that  televised 
models  are  as  effective  as  live  models  for  imitative 
behavior  (Feltz,  1982;  Feltz  & Landers,  1977;  Martens, 
Burwitz,  & Zuckerman,  1976).  Siedentop  (1983),  however, 
believed  that  a live  demonstration  is  still  the  best  means 
for  modeling  a skill.  Further,  he  emphasized  that 
demonstrations  should  be  carefully  planned;  they  should  not 
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be  done  off  the  cuff.  Some  suggestions  he  made  for 
effective  demonstrations  are  as  follows: 

1.  The  demonstrator  should  be  a good  model. 

2.  All  materials  should  be  set  up  and  ready  to  go 
ahead  of  time. 

3.  It  should  be  directed  to  the  students  visually 
and  conceptually  and  performed  in  a manner  they 
understand . 

4.  Each  important  feature  should  be  identified, 
explained,  and  performed  in  sequence. 

5.  Instructional  aids  should  be  used  if  they  serve 
to  emphasize  or  clarify  difficult  material  and 
do  not  take  too  much  extra  time. 

6.  Safety  points  should  be  emphasized  if  relevant. 

7.  The  demonstration  should  be  performed  in 
conditions  that  are  as  close  as  possible  to 
those  under  which  the  skill  will  be  performed. 

8.  Feedback  should  be  obtained  from  students  to  see 
if  they  understand  the  relevant  features.  This 
is  usually  best  done  by  intermittently  asking  a 
student  to  identify  a feature.  (pp.  195-196) 

Siedentop's  (1983)  last  suggestion  (listed  above) 

supports  Anshel  and  Singer's  (1980,  reported  in  the  Lesson 

Development  section)  conclusion  in  which  they  demonstrated 

the  value  of  using  questions  to  aid  in  the  learning  of  a 

gross  motor  skill.  Their  strategy,  called  adjunct 

questioning,  was  used  during  demonstration  and  practice 

settings  to  direct  a performer's  attention  to  significant 

points  of  form.  These  adjunct  questions  apparently  worked 

as  a verbal  cue  to  direct  the  student  toward  an  appropriate 

motor  response. 

In  summary,  the  effectiveness  of  demonstration/ 
modeling  behavior  in  physical  education  has  been  amply 
proven,  although  there  are  variables  which  affect  the  level 
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of  effectiveness.  The  demonstration,  to  be  effective,  must 
be  correct.  It  also  appears  to  be  more  effective  to  have 
the  demonstration  at  the  beginning  of  a lesson  rather  than 
later  in  the  learning  process.  Whether  or  not  there  are  an 
optimum  number  of  demonstrations  has  not  been  proven  since 
there  are  too  many  variables  involved.  It  does  appear  that 
repeating  a demonstration  may  aid  in  student  learning;  but, 
continued  demonstration  may  interfere  with  the  number  of 
practice  trials  a student  is  able  to  perform.  In  this  case, 
the  use  of  continued  demonstrations  may  actually  interfere 
with  effective  skill  acquisition. 

The  performance  environment  has  been  found  to  affect 
the  acquisition  of  information.  It  is  important  for  the 
teacher  to  manipulate  the  environment  in  which  a 
demonstration  is  to  be  presented  to  emphasize  the  important 
input  items.  Children's  attention  must  be  directed  to  the 
particular  features  of  a demonstration  which  are  important. 
The  use  of  concurrent  verbal  cues  is  effective  in 
accomplishing  this  task.  The  importance  of  using  vocabulary 
at  the  students'  level  of  understanding  was  also  pointed 
out.  Other  suggestions  for  effective  demonstrations  were  as 
follows:  the  effectiveness  of  having  materials  ready  ahead 
of  time,  demonstrating  in  conditions  as  similar  as  possible 
to  those  under  which  the  skill  will  be  performed,  using 
visual  aids,  and  having  a good  model  as  a demonstrator. 
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PRINCIPLE:  The  use  of 

demonstrations  in  the  presentation  of 
subject  matter  will  facilitate  higher 
student  achievement. 

PRINCIPLE:  Factors  which  contribute 

to  the  effectiveness  of  demonstrations 
are  placement  at  the  beginning  of  a 
learning  task,  correctness,  use  of 
verbal  task-related  cues, 
appropriateness  to  the  skill  level  of 
the  students,  manipulation  of  the 
performance  environment  to  reduce  task- 
irrelevant  cues,  performance  of  the 
skills  in  sequence,  conditions  which  are 
as  close  as  possible  to  those  under 
which  the  skill  will  be  performed, 
emphasis  on  safety  points,  use  of 
appropriate  vocabulary,  and  the  verbal 
presentation  skills  of  the  demonstrator. 

PRINCIPLE:  If  students  are  asked  to 

respond  to  specific  questions  related  to 
the  skill  they  are  performing,  learning 


will  be  increased. 
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Practice 

Practice  is  a key  component  of  learning  motor 
skills,  both  simple  and  complex.  Mohr  (1960)  reported  on  30 
studies  and  stated  that  although  specific  instruction 
resulted  in  the  learning  of  skills,  without  practicing  there 
was  practically  no  improvement.  She  also  reported  that 
practicing  skills  to  the  point  of  habit  formation  increased 
the  probability  of  the  subsequent  use  of  these  skills.  East 
(cited  in  Mohr)  found  that  it  was  possible  to  improve  with 
practice  only  and  no  instruction,  thus  emphasizing  the 
importance  of  practice  even  at  the  expense  of  instruction. 
Martens,  Burwitz,  and  Zuckerman  (1976)  also  confirmed 
previous  findings  that  subjects  improved  with  practice  and 
became  more  consistent  in  their  performance. 

If  practice  is  necessary  for  increased  achievement, 
will  more  time  spent  on  practice  automatically  result  in 
better  performance?  Is  there  a linear  relationship  between 
achievement  and  the  time  allotted  for  practice?  Researchers 
have  shown  that  the  amount  of  time  spent  in  practice  bears  a 
curvilinear  relationship  between  time— on— task  and  student 
achievement.  The  Florida  Department  of  Education  (1983), 
developers  of  the  Florida  Performance  Measurement  System 
( FPMS ) , stated  that  there  is  no  question  that  student  on- 
task  behavior  is  related  to  achievement,  but  there  might  be 
limits  to  the  generalizabili ty  of  the  conclusion.  They 
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reported  that  there  is  evidence  of  nonlinearity  of  relations 

between  time-on-task  and  achievement  gain  such  as  the  data 

reported  by  Rim  and  Coller  (cited  by  Florida  Department  of 

Education,  1983).  This  same  effect,  they  claim,  is  also 

shown  in  the  Fisher  et  al . (1978)  report,  "although  it  was 

not  tested"  (p.  66).  They  reported: 

On-task  behavior  does  not  alone  guarantee  increased 
achievement  . . . the  task  must  be  appropriate  for 

student  capability,  and  it  must  be  relevant  to  the 
learning  task  ...  a success  rate  of  70  to  80%  is 
compatible  with  research  findings.  Finally,  the 
time  required  to  learn  a given  assignment  is  just  as 
important  as  success  rate.  (p.  65) 

In  physical  activity,  the  number  of  trials  a 
performer  attempts  has  a greater  relationship  to  learning 
than  the  amount  of  time  spent  in  the  activity.  This 
conclusion  was  reached  by  Lay  (1979),  who  reported  on  the 
practical  application  of  selected  motor  learning  research. 
She  suggested  that  teachers  needed  to  utilize  their  time 
more  efficiently  so  that  maximum  participation  of  each 
student  is  guaranteed. 

This  need  for  more  time  for  student  engagement  was 
emphasized  by  Anderson  and  Barrette  (1978).  They  commented 
on  the  need  for  effective  time  management  and  class 
organization  as  they  relate  to  practice  and  stated  that  if 
teachers  "talk  less  frequently  and  students  spend  less  time 
waiting,  more  time  could  be  provided  for  students  to  engage 
in  such  substantive  movements  as  practice,  qameplayinq, 
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exercise,  explore,  etc."  (p.  23).  Hawkins,  Wiegand,  and 
Landin  (1985)  reinforced  this  statement  by  their  conclusion 
that  many  teachers  spend  too  much  time  in  whole-class 
instruction,  thus  reducing  the  time  students  can  be  motor- 
appropriately  engaged.  They  suggested  that  teachers  provide 
instruction  in  short  episodes  as  needed,  allowing  time  for 
practice  between  episodes.  This  would  limit  the  amount  of 
verbal  information  that  students  had  to  process  at  any  one 
time . 

Although  professional  intuition  would  lead  us  to 
endorse  the  use  of  goals  at  the  beginning  of  practice 
sessions,  the  effectiveness  of  establishing  goals  for 
practice  is  also  well  supported  in  research.  A number  of 
studies  on  the  effectiveness  of  establishing  goals  were 
reported  in  the  Lesson  Development  portion  of  this  chapter 
( Malmud , 1974;  Dey  & Maur , 1965;  Duchastel  & Merrill,  1973; 
Florida  DOE,  1983;  Graham  & Heimerer,  1981;  Hollingsworth, 
1975;  Mace,  cited  by  Hollingsworth,  1975;  Siedentop,  1983; 
Zaichkowsky,  Zaichowsky,  & Martinek,  1978).  Conclusions 
drawn  from  those  studies  are  as  follows:  (a)  Working  toward 

a goal  leads  to  a higher  level  of  performance,  and  (b)  use 
of  a specific  goal  results  in  higher  student  achievement 
than  an  abstract  goal  (e.g.,  "Do  your  best").  Hollingsworth 
(1975)  suggested  that  specific  goals  may  be  used  to  motivate 
subjects  who  have  a low  degree  of  motivation.  Siedentop 
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(1983)  confirmed  that  view  and  said  that  "the  components  of 

an  instructional  system  can  increase  the  students' 

motivational  level.  When  students  have  clear  goals  and  know 

what  criteria  exist  for  measuring  achievement  towards  those 

goals,  they  can  monitor  their  own  progress  and  this  itself 

improves  motivation"  (p.  176).  He  also  stated  that 

Boredom  is  not  a factor  in  physical  education  as 
long  as  successful  learning  experiences  are  being 
arranged  and  clear  progress  is  being  made.  Students 
tend  to  get  bored  by  doing  repetitive  activities 
that  require  little  skill  or  effort.  (p.  154) 

Morris  (1981)  stated  that  effective  practice  depends 
on 

1.  A clearly  defined  goal  or  purpose  for  the 

practice. 

2.  A carefully  structured  progression 

a.  Simple  to  complex. 

b.  Stationary  to  moving. 

c.  Known  to  unknown. 

d.  Offensive  to  defensive. 

e.  Self-paced  to  externally  paced. 

3.  Provision  for  feedback.  (p.  49) 

Siedentop  (1983),  drawing  upon  his  research,  dealt 
with  the  question  of  just  what  types  of  class  formats  might 
be  considered  practice  and  what  types  of  goals  might  be 
established  for  those  practice  sessions.  He  began  by 
encouraging  the  development  of  a process  of  goal  development 
that  begins  with  the  establishment  of  major  program  goals 
such  as  "the  traditional  four  goals  of  physical  development, 
skill  development,  mental  development,  and  social 
development  that  have  been  widely  accepted  in  physical 
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education  since  early  in  this  century"  (p.  150).  Next, 
activities  which  contribute  to  the  achieving  of  these  goals 
should  be  selected.  These  activities  constitute  units  of 
instruction  which  should  be  "long  enough  to  ensure  real  goal 
accomplishment"  (p.  154).  Daily  lesson  plans  should  reflect 
the  terminal  objectives  established  for  that  unit  of 
instruction.  Lesson  formats  or  instructional  methodologies 
referred  to  by  Siedentop  include  lectures,  demonstrations, 
drills,  individual  tasks,  miniscrimmages,  and  games. 

The  use  of  instructional  goals  or  objectives  for 
games  is  one  that  requires  closer  examination.  To  quote 
Siedentop  (1983): 

How  often  have  you  seen  students  participate  in 
bump,  pass,  setting,  and  serving  drills,  only  to 
observe  no  evidence  of  those  skills  once  a 
volleyball  game  begins?  . . . The  fault  lies  in 

the  instructional  system.  Games  can  be  directed 
toward  the  achievement  of  specific  objectives  and 
still  be  fun.  Indeed,  a strong  case  can  be  made  for 
the  proposition  that  students  will  gradually  learn 
to  have  more  and  more  fun  in  game  play  once  they 
begin  to  use  newly  acquired  skills  in  the  game 
situation.  (p.  182) 

Siedentop  went  on  to  agree  that  constructing  instructional 
objectives  for  game  situations  is  more  difficult  than  for 
drill  purposes,  but  that  this  should  not  deter  a teacher 
from  making  the  effort.  Objectives  for  game  play  should 
call  the  students'  attention  to  the  application  of  important 
instructional  goals.  Emphasis  on  the  achieving  of  these 


62 


goals  tends  to  make  the  game  play  considerably  more 
educational  and,  in  the  long  run,  more  fun. 

An  important  consideration  in  selecting  the  specific 

activities/skills  to  be  covered  in  a practice  and  in 

establishing  the  amount  of  time  to  be  allocated  to  each 

stage  of  a practice  is  a concept  known  as  success-level,  or 

academic  learning  time  (Fisher  et  al.,  1978).  Fisher  et  al . 

stated  that  their  findings  consistently  pointed  out  the 

importance  of  students  performing  tasks  with  a high  level  of 

success  (i.e.,  correctly).  The  average  student  in  their 

study  spent  about  half  the  time  working  on  tasks  that 

provided  high  success;  however,  "Students  who  spent  more 

time  than  the  average  in  high-success  activities  had  higher 

achievement  scores  . . . better  retention  of  learning  . . . 

and  more  positive  attitudes  toward  school"  (p.  107).  One 

exception  to  this  was  noted  by  the  researchers.  They  found 

that  students  who  were  generally  skilled  at  learning  did  not 

require  as  high  a percentage  of  time  at  the  high-success 

level  in  order  to  perform  well.  Good  and  Brophy  (1987) 

reported  that  recent  research  findings  indicate  that  success 

rates  of  90-100%  produce  more  learning.  In  applying  this 

concept,  it  is  important  to  remember  the  cyclic  nature  of 

learning  as  stated  by  Fisher  et  al . : 

Learning  is  a process  of  moving  from  not  knowing  to 
knowing.  When  new  material  is  introduced  the 
student  most  likely  will  not  understand  completely 
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and  will  make  some  errors.  . . . Eventually,  the 

student  will  perform  correctly,  although  probably 
with  some  effort.  ...  At  some  later  point,  the 
student  knows  the  material  so  well  that  further 
practice  is  of  minimal  value;  it  is  time  to  move  on 
to  something  new.  (p.  107) 

Students  given  high-success  assignments  will  probably  stay 
on-task  and  complete  them  successfully.  If  assignments  are 
too  difficult,  students  may  not  complete  them  and  may  lose 
interest  and  motivation  for  learning. 

Another  aspect  of  practice  that  is  not  often 
considered  is  that  of  warm-up  exercises.  It  has  been 
assumed  by  many  teachers  and  coaches  that  warm-up  exercises 
do  have  a beneficial  effect  on  performance,  and  warm-ups 
have  long  been  accepted  as  an  appropriate  way  to  begin  a 
physical  education  class.  The  value  of  warm-ups  is  easily 
appreciated  in  relation  to  the  physiological  principle  of 
preparing  the  muscles  of  the  body  to  work  more  effectively 
with  less  damage,  but  there  have  been  some  studies  dealing 
with  the  hypothesis  that  warm-up  exercises  may  also  have  a 
beneficial  effect  on  performance.  Whether  or  not  warm-ups 
increase  the  motor  learning  that  will  take  place  during  the 
rest  of  the  class  time  should  be  of  interest  to  the  teacher 
who  wishes  to  plan  for  the  most  effective  use  of  available 
practice  time.  In  her  survey  of  the  literature,  Mohr  (1960) 
concluded  that  "the  evidence  concerning  the  beneficial 
effects  of  warm-up  exercise  is  scanty  and  contradictory" 
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(p.  330).  Six  of  the  12  researchers,  did  report  beneficial 
effects  for  skill  development  from  warm-ups  (Blank,  1955; 
DeVries,  1959;  Michael,  Skubic,  & Rochelle,  1957;  Pacheco, 
1957,  1959;  Paseltiner,  cited  by  Mohr,  1960;  Thompson, 
1958).  The  warm-ups  in  these  studies  were  closely  related 
to  the  skill  to  be  performed  and  were  performed  extensively 
rather  than  for  a very  short  duration.  Therefore,  it 
appears  that  when  warm-ups  closely  match  the  skill  or 
activity  which  is  to  be  taught  and  when  they  are  performed 
for  a somewhat  lengthy  period  of  time  (longer  than  5 
minutes),  they  will  contribute  to  increased  skill 
acquisition. 

Previous  researchers  have  indicated  that  the 
effectiveness  of  practice  is  conditioned  by  at  least  three 
variables:  length  of  the  practice  session,  time 

distribution  of  practice,  and  the  nature  of  the  material 
(Florida  DOE,  1983).  In  planning  an  effective  lesson  in 
physical  education,  knowledge  of  these  variables  is  a key 
component.  Should  a skill  be  practiced  continually  with 
little  or  no  rest  between  the  beginning  and  end  of  the 
activity  (massed  practice) , or  should  practice  be  of  a 
shorter  duration  of  time  and  interspersed  with  rest  or 
alternate  kinds  of  activities  (i.e.,  distributed  practice)? 
Most  researchers  investigating  the  value  of  massed  practice 
versus  distributed  practice  have  strongly  indicated  the 
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effectiveness  of  distributed  practice.  In  her  survey  of  the 
literature,  Mohr  (1960)  reported  on  45  studies.  "Of  these 
40  favored  distributed  practice,  ...  3 favored  massed, 

. . . 2 indicated  no  significant  difference"  (p.  326). 

Singer  (1968)  reviewed  29  studies  pertinent  to  practice  and 
found  that  18  studies  demonstrated  superior  performance  with 
distributed  practice,  5 with  massed  practice,  and  6 had  no 
differences.  Lindquist  and  Witte  (1977),  as  a result  of  an 
extensive  review  of  massed  and  distributed  practice,  came  to 
the  conclusion  that  distributed  practice  is  particularly 
better  for  beginners  because  there  is  less  fatigue,  boredom, 
and  frustration  with  new  and,  therefore,  difficult  skills. 
Hawkins,  Wiegand,  and  Landin  (1985)  recommended  a 
distributed  practice  technique  for  teachers  by  suggesting 
that  they  provide  instruction  in  short  episodes  as  needed, 
alternating  with  time  for  practice,  thus  increasing  the 
number  of  individual  times  students  engage  in  a specific 
skill  or  activity. 

Rodgers  (1936)  conducted  a study  to  determine  the 
most  effective  distribution  of  practice  for  acquiring  the 
skills  necessary  for  playing  a game.  Using  fifth  and  sixth 
graders,  she  compared  three  methods  of  teaching: 

1.  Playing  the  game  without  any  practice  of 
game  technique 
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2.  Practicing  isolated  game  techniques  for 
90%  of  the  class  time  and  playing  the 
game  for  10% 

3.  Practicing  the  techniques  in  relation  to 
a felt  need  for  improving  skill  in  those 
techniques  while  playing  the  game 

( play/pract ice/p lay/pract ice ) 

The  results  of  the  study  were  clear:  Method  3 resulted  in 

significantly  greater  skill  acquisition,  it  was  more  fun, 
and  it  was  the  most  efficient  method  for  transmitting  the 
rules  and  strategies  of  the  game. 

One  of  the  concerns  physical  educators  have  is  the 
speed  at  which  a motor  skill  should  be  taught.  Should  the 
skill  be  taught  slowly  at  first,  making  it  easier  to  perform 
the  many  component  motor  patterns  necessary  for  efficient 
performance,  and  then  gradually  increased  to  the  speed  that 
is  necessary  for  proper  game  play?  Or,  should  the  skill  be 
introduced  at  game  speed,  possibly  making  it  more  difficult 
to  perfect  the  component  motor  skills?  Sage  and  Hornak 
(1978)  reported  that  either  method  appeared  successful.  The 
practice  can  be  constant  at  the  speed  at  which  the  students 
should  perform,  or  it  may  be  acquired  as  efficiently  through 
a gradual  and  progressive  increase  in  speed.  They  cite 
other  studies  supporting  this  somewhat  conflicting 
conclusion:  Jensen  (1975,  1976),  who  found  that  motor 


67 


skills  would  be  learned  more  naturally  on  a gradual  build-up 
of  speed;  Hornak  (cited  by  Sage  & Hornak,  1978), 

Poppelreuter  (cited  by  Sage  & Hornak,  1978),  Solley  (1952), 
Woods  (1967),  and  Fulton  (1945),  who  stated  that  early 
emphasis  on  speed  during  practice  is  more  beneficial  if 
speed  is  the  predominant  factor  in  final  performance  and, 
that  if  both  speed  and  accuracy  are  important,  an  early 
emphasis  on  both  seems  more  effective.  Fulton  explained 
that  when  a person  practices  at  a high  speed  and  then 
strives  for  accuracy,  the  movement  of  the  act  does  not  have 
to  be  changed,  but  that  transferring  low-speed  practice  to 
high  speed  requires  new  movement.  It  appears  that  students 
should  practice  a skill  at  the  speed  that  best  accommodates 
their  ability  to  acquire  that  skill  well  enough  to  be  used 
effectively  in  a real-life  situation.  For  example,  a 
greater  need  for  accuracy  than  speed  may  require  the 
instructor  to  slow  down  the  initial  performance  of  a skill 
in  order  to  emphasize  the  fine  motor  control  that  may  be 
necessary  for  accuracy  in  performance. 

Closely  allied  to  the  question  of  speed  is  the 
amount  of  variability  which  a subject  should  experience  when 
practicing  a task.  Bird  and  Rikli  (1983),  Cashin  (1983), 
Moxley  (1979),  Newell  and  Shapiro  (1976),  and  Zelaznick, 
Shapiro,  and  Newell  (1978)  all  supported  the  notion  of  the 
superiority  of  variable  practice  tasks  and  contexts  over 
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constant  practice  for  reducing  error  in  the  performance  of  a 
skill.  A wide  variation  of  practice  tasks,  drills,  and 
practice  environments  helps  the  teacher  to  parallel  the 
constantly  shifting  flow  of  game  play  more  closely.  These 
varied  practice  conditions  should  result  in  more  effective 
performance  during  games  and  activities  requiring  use  of  the 
skills  taught. 

In  learning  motor  skills,  is  there  ever  a time  when 
mental  practice,  either  alone  or  in  combination  with 
physical  practice,  is  effective  in  improving  performance 
quality?  Mohr  (1960),  in  her  survey  of  research,  supported 
the  effectiveness  of  mental  practice.  Vandell,  Davis,  and 
Clugston  (1943),  studying  high  school  and  college  freshmen 
males  on  dart  throwing  and  basketball  free  throw  skills, 
indicated  that  while  there  was  practically  no  improvement 
with  no  practice,  both  mental  and  physical  practice  were 
equally  beneficial.  In  Twining ' s (1949)  study  with  college 
men,  subjects  with  no  practice  showed  no  learning,  while 
mental  and  physical  practice  both  resulted  in  significant 
learning,  although  the  learning  from  physical  practice  was 
much  greater.  After  a survey  of  the  literature,  Richardson 
(1967)  supported  this  contention.  He  stated  that  physical 
practice  was  superior  to  mental  practice,  but  that  several 
studies  demonstrated  that  a combination  of  mental  and 
physical  practice  was  as  effective,  and  in  some  cases  more 
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effective,  than  physical  practice  alone.  He  suggested  that 
the  use  of  learning  stations,  one  of  which  would  be  devoted 
to  mental  practice  (e.g.,  loop  films,  posters,  or  handouts) 
would  be  effective. 

The  possibility  of  carrying  over  motor  skills  from 
one  game  or  activity  to  another  is  of  keen  interest  in  the 
planning  of  effective  and  efficient  instruction.  If,  by 
practicing  one  type  of  skill,  a performer  can  establish  a 
foundation  for  a subsequent  activity  and  its  component 
skills,  this  would  indicate  the  need  for  a systematic 
progression  of  skills  and  activities  in  the  curriculum.  The 
task  specificity  of  motor  performance,  however,  is  well 
established.  In  his  review  of  the  literature,  Dunham  (1970) 
reported  that  there  is  general  agreement  that  one's  ability 
to  perform  a given  skill  does  not  dictate  one's  performance 
on  a subsequent  skill.  His  study  supported  task 
specificity,  even  when  skills  are  practiced  over  an  extended 
period  of  time  (24  days).  While  the  variation  between 
initial  and  final  reliabilities  ranged  between  .754  and  .883 
on  one  task  and  between  .923  and  .622  on  the  other  task,  the 
intertask  correlation  coefficients  were  near  zero.  His 
findings  supported  Mohr's  (1960)  conclusion  that  "skill 
learning  results  from  specific  practice  of  the  particular 
skill"  (p.  339).  It  follows,  therefore,  that  if  the  object 
of  a lesson  is  for  students  to  learn  a specific  skill,  they 
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should  practice  that  skill  alone.  The  skill  should  also  be 
practiced  in  a manner  as  close  to  the  actual  game/activity 
situation  as  possible  in  order  to  facilitate  transfer  to  the 
game/activity.  The  implication  here  is  that  skills 
practiced  only  in  isolated,  artificial  drill  and  practice 
situations  must,  to  some  extent,  be  retaught  when  it  is  time 
to  perform  them  in  a game  or  activity. 

An  interesting  research  result  which  can  be  applied 
to  the  question  of  the  specificity  of  performance  is  given 
by  Giebink  and  McKenzie  (1985)  from  two  related  studies 
designed  to  examine  the  effects  of  three  intervention 
strategies  (instructions  and  praise,  modeling,  and  a point 
system)  on  children's  sportsmanship  in  physical  education 
and  recreation  settings.  They  found  that  while  all  three 
interventions  increased  sportsmanship  and  decreased 
unsportsmanlike  behaviors  in  the  initial  morning  softball 
class  setting  (where  they  were  employed) , the  behaviors  were 
not  transferred  by  the  same  students  to  the  evening 
basketball  class.  In  other  words,  the  intervention 
strategies  resulted  in  more  effective  behaviors  in  the 
morning  class  where  they  were  taught,  but  no  difference  was 
demonstrated  in  the  evening  class  even  though  both  classes 
were  team  sports  classes. 

Although  researchers  have  pointed  out  this 
specificity  of  practice  and  the  lack  of  transfer,  a study  by 
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Puretz  (1983),  conducted  during  instruction  of  a slightly 
complex  movement  pattern,  supported  the  theory  of  bilateral 
transfer  (replication  by  the  individual  on  the  nonpreferred 
side  without  any  practice  after  demonstrating  and  then 
teaching  to  the  preferred  side) . She  concluded  that 
teachers  should  teach  to  the  nonpreferred  side  (usually  the 
left  side)  to  maximize  learning  through  bilateral  transfer. 
This  finding  was  supported  by  Finlayson  and  Reitan  (1976)  in 
a study  of  handedness  in  relation  to  measures  of  motor  and 
tactile-perceptual  functions.  Puretz  also  found  that 
transfer  was  aided  by  the  amount  of  practice  given  to  the 
students  after  the  initial  demonstration.  As  might  be 
expected,  5 minutes  of  practice  was  more  effective  than  only 
one  practice  trial. 

During  practice  sessions,  the  question  of  how  much 

teacher  monitoring  or  involvement  should  occur  becomes 

important.  In  summarizing  the  monitoring  behavior  of 

effective  teachers  from  his  comprehensive  review  of  the 

literature,  Medley  (1977)  concluded: 

The  effective  teacher  spends  more  time  working  with 
them  [students]  all  in  one  large  group  . . . they  do 

spend  some  time  in  "independent  study";  but  their 
teachers  behave  differently  during  this  time  than 
ineffective  teachers  . . . they  spend  more  time 

checking  . . . [and]  they  are  less  perfunctory  when 

they  do  so.  . . . The  ineffective  teacher  . . . 

leaves  them  pretty  much  to  themselves.  (p.  18) 


72 


Close  adult  monitoring  of  students  working  in  groups 
or  individually  was  listed  by  Graham  and  Heimerer  (1981)  as 
one  of  the  effective  teaching  skills  that  physical  educators 
should  adapt  from  the  effective  teaching  process-product 
literature.  Tousignant  and  Siedentop  (1983)  conducted  an 
investigation  of  student  accountability  systems  used  by 
teachers  and  arrived  at  the  conclusion  that  a system  which 
keeps  students  on-task  does  not  exist  by  itself;  it  must  be 
implemented.  Students  learn  about  the  task  requirements 
from  the  teacher's  instructions  and  also  from  the 
consequences  made  contingent  upon  their  accomplishment  of 
the  tasks.  The  consistency  and  the  firmness  of  the 
monitoring  strongly  influence  student  behavior  and  also  have 
a direct  effect  on  student  participation  and  achievement. 
Tasks  tend  to  be  modified  to  the  degree  that  there  is  a lack 
of  consistency  between  the  focus  of  the  task  and  the  focus 
of  the  monitoring.  Tousignant  and  Siedentop  found  that  in 
some  classes  students  were  held  responsible  only  for  minimal 
participation.  Unexcused  absences,  inappropriate  clothing, 
and  failure  to  take  part  in  class  activities  were  accepted 
by  the  teacher  with  little  or  no  comment — a "show  up,  dress 
out,  stand  up"  (p.  54)  system.  They  identified  four  basic 
categories  of  student  behavior: 

1.  Students  engaged  with  the  task-as-stated-by-the- 
teacher . 
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2.  Students  engaged  in  a "modified  task."  If  the 
task  was  too  easy  or  too  hard  they  changed  tasks 
by  behaviors  such  as  skipping  parts  and  changing 
rules . 

3.  Students  engaged  in  "deviant  off-task  behavior." 
These  were  activities  which  actively  interfered 
with  the  lesson  such  as  talking,  fooling  around, 
or  fighting. 

4.  Students  acted  as  "competent  bystanders."  They 
avoided  participation  without  misbehaving.  They 
knew  how  to  use  the  class  format  to  hide  their 
low  level  of  participating.  They  stood  in  line, 
moved  back  and  forth  without  ever  engaging  in 
activity,  avoided  playing  major  roles  in  games, 
and  yet  became  temporarily  involved  when  the 
teacher  supervised  closely.  (p.  49) 

Tousignant  and  Siedentop  (1983)  found  that  when 

students  learned  that  no  matter  what  the  stated  task  was, 

all  they  had  to  do  was  to  follow  the  general  directions  and 

participate  as  little  as  they  wanted  providing  they  remained 

within  the  limits  of  the  real  task  requirements,  the 

instructional  system  was,  in  reality,  suspended.  The  real 

task  for  the  student  often  became  the  manipulation  of  the 

system.  Ultimately,  the  quantity  and  quality  of  the 

monitoring  had  more  effect  on  the  task  than  the  assignment. 

However,  when  the  effort  demonstrated  by  the  students  was 

the  criterion  used  to  determine  grades,  it  seemed  to 

increase  the  proportion  of  students  who  tried  hard  and  to 

reduce,  to  a certain  extent,  the  quantity  of  off-task 

behaviors.  As  far  as  the  teacher's  quality  of  monitoring, 

when  the  task  included  a measurement  of  the  skill  or 

quantity  of  the  task,  the  focus  of  teacher  monitoring  was 
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generally  congruent  with  the  task,  as  stated,  and  fewer 
students  tried  to  avoid  involvement  in  the  tasks. 

Tousignant  and  Siedentop  (1983)  identified  four  main 
forms  of  monitoring  from  the  data  gathered  during  the  study: 
(a)  observing,  (b)  officiating,  (c)  observing  and  giving 
corrective  feedback,  and  (d)  making  permanent  records  of  the 
task.  Silent  and  distant  observation  mostly  prevented 
misbehavior,  while  permanent  recording  was  associated  with  a 
higher  rate  of  on-stated-task  behavior. 

A summary  of  the  studies  relating  to  practice 
indicates  a strong  need  for  practice  in  order  for  the 
learning  of  skills  to  be  at  an  optimal  level  of 
effectiveness.  Practicing  skills  to  the  point  of  habit 
formation  has  been  shown  to  increase  the  probability  of  the 
subsequent  use  of  these  skills.  Although  verbal  instruction 
is  effective  in  the  learning  of  skills,  it  is  not  as 
effective  in  skill  improvement  as  practice;  therefore, 
teacher  talk  should  not  result  in  a restriction  of  the 
amount  of  practice  time  available.  Techniques  of  effective 
time  management  and  class  organization  should  be  utilized  so 
that  maximum  participation  in  learning  tasks  is  guaranteed. 
The  tasks  themselves  must  be  appropriate  for  student 
capability  and  must  be  relevant  to  the  learning  task.  A 
success  rate  of  70  to  100&  is  indicated  as  effective  by 


research  findings. 
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An  effective  practice  session  requires  a clearly 
defined  goal  or  purpose,  use  of  a carefully  structured 
skills  progression,  and  provision  for  feedback.  Warm-up 
exercises  may  have  a beneficial  effect  on  performance  if 
they  are  closely  related  to  the  skill  to  be  performed  and 
are  performed  extensively  rather  than  for  a very  short 
duration. 

Many  researchers  have  emphasized  the  value  of 
distributed  practice  over  massed  practice.  It  is  effective 
for  teachers  to  provide  instruction  in  short  episodes, 
allowing  time  for  practice  inbetween.  A strategy  of 
play/practice/play/practice  was  effective  in  one  study. 
Mental  practice,  combined  with  physical  practice,  has  also 
been  shown  to  have  value  in  increasing  performance. 

The  speed  at  which  a motor  skill  should  be  taught 
does  not  appear  to  be  a strong  factor  in  learning  a skill, 
although  some  researchers  have  shown  that  skills  can  be 
learned  more  naturally  on  a gradual  build-up  of  speed. 

Skills  should  be  taught  at  the  speed  which  best  matches  the 
ability  of  the  students  to  acquire  that  skill  effectively. 

The  task  specificity  of  motor  performance  is  well 
documented.  One's  ability  to  perform  a given  skill  does  not 
directly  correlate  with  one's  performance  on  another  skill. 
If  the  object  of  a lesson  is  for  students  to  learn  a 
specific  skill,  they  should  practice  that  skill.  In  order 
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to  facilitate  transfer  to  the  game,  the  skill  should  be 
practiced  in  a manner  as  close  to  the  actual  game  situation 
as  is  practical.  For  reducing  error  in  the  performance  of  a 
skill,  a wide  variation  in  the  types  of  practice  tasks  and 
drills  should  be  presented.  When  it  is  important  for  skills 
to  be  learned  on  both  sides  of  the  body,  teachers  should 
teach  to  the  nonpreferred  side  first  since  the  skill  will 
transfer  to  the  preferred  side  more  quickly  than  if  the 
skill  were  presented  to  the  preferred  side  initially  and 
then  transferred. 

The  effectiveness  of  practice  is  only  as  good  as  the 
consistency  and  firmness  of  teacher  monitoring.  Students 
learn  about  the  task  requirements  from  the  instruction  given 
and  also  from  the  consequences  stated  for  nonperformance. 
Poor  teacher  monitoring  can  result  in  students  modifying 
tasks  or  avoiding  participation  in  some  manner.  Some 
students  have  found  that  all  they  have  to  do  is  follow  the 
general  directions  and  appear  busy  whenever  the  teacher 
looked  in  their  direction.  Many  students  spend  their  time 
manipulating  the  system  rather  than  participating  in  the 
assigned  task.  An  emphasis  on  student  effort,  task 
quantity,  or  measurement  of  the  skill  appeared  to  result  in 
greater  on-task  behavior  and  more  focused  teacher 
monitoring.  Methods  of  monitoring  might  include 
observation,  officiating,  giving  corrective  feedback,  and 
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making  permanent  records  of  the  task,  the  latter  method 
being  associated  with  a higher  rate  of  on-stated— task 
behavior . 

PRINCIPLE:  If  the  teacher  provides 

practice  sessions  of  appropriate  length, 
number,  and  distribution,  performance 
will  improve. 

PRINCIPLE:  Practice  sessions  will 

be  more  effective  if  the  goals  are 
stated,  if  a carefully  structured  skill 
progression  is  followed,  if  students 
have  maximum  participation  time,  if  the 
success-level  is  high  enough,  if  the 
quality  of  student  performance  is 
closely  monitored,  and  if  feedback  is 
provided . 

PRINCIPLE:  Skill  practice  must  be 

specific  to  the  skill  acquisition 
desired;  however,  practicing  a skill  in 
a variety  of  environments  will  result  in 
less  performance  errors. 

PRINCIPLE:  Warm-ups  which  are 

closely  related  to  the  skill  to  be 
learned  may  contribute  to  an  increase  in 


performance . 
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PRINCIPLE:  The  speed  at  which  a 

skill  is  taught  may  have  an  effect  on 
its  acquisition. 

Feedback 

Studies  of  the  effect  of  feedback  on  learning  a 
motor  skill  have  been  popular  in  physical  education.  These 
studies  of  feedback  or  knowledge  of  results  (KR)  show  it  to 
be  the  strongest,  most  important  variable  affecting 
performance  and  learning  (Presbie,  1977).  Regarding 
knowledge  of  results,  Bilodeau  and  Bilodeau  (1961)  reported 
that  "there  is  no  improvement  without  it,  progressive 
improvement  with  it,  and  deterioration  after  its  withdrawal" 
(p.  250).  Smoll,  in  his  1972  study,  confirmed  information 
feedback  (IF)  as  one  of  the  most  critical  variables 
affecting  the  acquisition  and  performance  of  motor  skills. 
Its  effectiveness  may  be  due  to  the  fact  that,  at  least  in 
some  cases,  feedback  statements  function  as  a type  of  goal 
or  objective.  Hollingsworth  (1975)  reported  on  the  effect 
of  specific  performance  goals  on  the  performance  of  a gross 
motor  skill.  She  found  that  a teacher's  feedback  statement, 
even  a very  abstract  one  such  as  "Try  harder"  or  "You  can  do 
it  better,"  served  as  a type  of  goal  for  the  performers  and 
had  a positive  effect  on  subsequent  student  performance. 
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To  give  good  feedback,  it  is  necessary  to  know  how 
feedback  or  knowledge  of  results  is  defined.  Bilodeau  and 
Bilodeau  (1961)  cited  Brown's  definition  of  KR  as  "the 
process  of  providing  the  learner  with  information  as  to  how 
good  or  how  accurate  his  reactions  are"  (p.  251).  Ammons 
(1961)  emphasized  the  need  for  specificity  in  feedback 
statements  by  concluding  that  "the  more  specific  the 
knowledge  of  performance  the  more  rapid  the  improvement"  (p. 
253).  Harney  and  Parker  (1972),  who  defined  social 
reinforcement  (SR)  as  visual  and  verbal  cues  in  the  form  of 
smiles  and  frowns  or  praise  and  reproof,  stated  that  any 
type  of  social  reinforcement  treatment  (positive  or 
negative)  by  a teacher  resulted  in  significantly  better 
performance . 

The  type  of  feedback  received  can  be  varied. 

Presbie  and  Brown  (1977)  stated  that  praise,  videotape 
replay,  posting  scores,  or  photographs  can  all  be  used 
effectively.  In  an  analysis  of  the  use  of  videotape  replay 
in  the  teaching  of  motor  skills,  Rothstein  (1980),  reporting 
on  studies  done  earlier,  stated  that  no  significant 
differences  were  found  in  33  out  of  52  studies  between  or 
among  videotape  replay  conditions  and  other  experimental  or 
control  conditions.  Although  this  finding  might  generally 
lead  a reviewer  to  conclude  that  the  findings  in  the  area  of 
videotape  replay  were  inconclusive,  Rothstein  stated  that  a 
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meta-analysis  of  the  studies  led  to  the  following 
conclusions  on  three  critical  variables: 

Skill  level  — Advanced  beginners  and 

intermediates  benefited  more 
than  beginners. 

Use  of  verbal  cues — There  were  greater  benefits  when 

performers  were  told  what  to 
look  at  in  viewing  the  replay. 

Number  of  uses  — Performers  who  had  multiple 

opportunities  (5  or  more  times) 
to  view  a videotape  replay  with 
interspersed  practices  benefited 
more.  (p.  59) 

Lloyd  (1969)  confirmed  the  value  of  visual  feedback  when  he 
demonstrated  that  college  students  progressed  better  after 
viewing  loop  films  and  slow-motion  pictures  of  the  learner 
rather  than  after  just  receiving  audio  feedbacks. 

Although  professional  experience  alone  would 
convince  us  that  effective  feedback  must  be  correct  and 
timely,  Malina  (1969)  supported  this  conviction  when  he 
reported  on  studies  in  motor  skill  learning  which 
demonstrated  that  providing  feedback  in  a timely  manner 
improved  performance  on  the  required  visual  and  motor 
skills;  moreover,  restricting  or  delaying  the  feedback 
impeded  the  acquisition  of  skills.  He  stated  that  accuracy 
is  the  key  component  in  effective  feedback:  Feedback  must 

be  specific  and  complete.  Siedentop  (1983)  stated  that  50% 
to  70%  of  a teacher's  feedback  statements  should  contain 
information  that  is  specific  to  the  instructional  objective. 
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He  also  indicated  that  it  is  possible  for  a teacher  to 
deliver  at  least  four  feedbacks  per  minute. 

Smoll  (1972),  in  his  survey  of  literature,  reported 
that  the  more  precise  information  feedback  (IF)  is  relative 
to  performance  on  a motor  task,  the  more  efficient 
acquisition  of  skill  will  be.  He  investigated  the  effect  of 
precise  feedback  relative  to  timing  during  the  acquisition 
of  a gross  motor  skill  of  tossing  a duckpin  bowling  ball  at 
a specified  velocity  equal  to  10%  of  the  subject's  maximum 
velocity.  The  experimenter  verbally  presented  IF 
immediately  after  each  delivery.  The  subjects  in  group 
1/100  received  quantitative  information  feedback  (IF) 
immediately  after  each  delivery,  accurate  to  hundredths  of  a 
second;  the  subjects  in  group  1/10  received  quantitative  IF, 
accurate  to  lOths  of  a second;  and  the  subjects  in  group 
qual  received  IF  in  qualitative  form  (i.e.,  "too  slow,"  "too 
fast,"  or  "correct").  The  study  revealed  a significant 
improvement  in  accuracy  (p  < .01)  for  all  three  groups; 
however,  the  mean  absolute  error  for  group  qual  was 
significantly  greater  than  the  means  for  group  1/100  and 
group  1/10.  The  results  indicated  a steady  trend  throughout 
the  study  favoring  the  groups  receiving  the  more  precise 
feedback.  Since  the  difference  between  the  1/100  and  1/10 
groups  was  not  significant,  Smoll  supported  Ammons'  (1956) 
conclusion  that  "there  is  an  optimum  specificity  of 
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knowledge,  and  that  additional  knowledge  will  not  improve 
performance"  (p.  287).  This  finding  also  indicates  the 
importance  of  considering  precision  of  information  feedback 
not  only  in  terms  of  what  is  meaningful  to  performers,  but 
also  in  terms  of  what  the  student  is  capable  of  using. 

To  be  effective,  feedback  must  be  understood  by  the 
person  performing  the  skill.  Reeve  and  Magill  (1981)  found 
that  before  knowledge  of  results  (KR)  can  serve  as  the 
external  standard,  the  learner  must  first  develop  an 
understanding  of  the  information  contained  in  the  KR 
statement.  This  finding  brings  up  a question  as  to  how 
specific  a KR  statement  should  be  and  indicates  that  there 
is  a limit  to  the  specificity  of  KR  which  can  be  effectively 
communicated  to  a learner.  Siedentop  (1983)  affirmed  that 
the  more  highly  skilled  a student  becomes,  the  more  he  or 
she  can  benefit  from  highly  precise  and  technical 
information  in  the  feedback  statement. 

Ammons  (1956),  after  an  intensive  review  of  the 
literature  then  available  on  the  effectiveness  of  feedback 
in  a task  or  performance  setting,  developed  11 
generalizations  regarding  feedback,  5 of  which  are  of 
particular  note  for  those  looking  for  effective  teaching 
techniques  or  principles.  Only  those  generalizations  which 
were  supported  by  more  than  one  study  and  which  have  a 
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direct  bearing  upon  teaching  effectiveness  are  reported 
here . 

1.  The  more  specific  the  knowledge  of 
performance,  the  more  rapid  the  improvement  and  the 
higher  the  level  of  performance.  (p.  287) 

The  more  specifically  a subject  knows  how  he  or  she  has 

performed,  the  more  likely  he  or  she  is  able  to  make 

appropriate  corrections.  Ammons  did  note  an  exception  to 

this  finding  and  commented  that  there  was  an  optimum 

specificity  of  knowledge  (as  noted  previously).  The  point 

of  optimum  specificity  of  knowledge  of  performance  is 

related  to  some  extent  to  the  stage  of  learning.  At  the 

start  of  learning  a new  task  the  subject  can  put  to  use 

little  information,  but  as  learning  proceeds,  he  or  she  is 

able  to  use  more  and  more  precise  feedback. 

2.  Knowledge  of  performance  affects  rate  of 
learning  and  level  reached  by  learning.  (p.  283) 

The  researchers  almost  universally  reported  that  where 

knowledge  of  performance  is  given  to  one  group  and  withheld 

or  reduced  to  another  group,  the  former  group  learns  more 

rapidly  and  reaches  a higher  level  of  proficiency. 

3.  Knowledge  of  performance  affects  motivation. 

(p.  285) 

Students  were  generally  more  alert  and  reported  enjoying 
themselves  when  they  were  given  knowledge  of  their 
performance.  Ammons  cautioned,  however,  that  motivation 
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"does  not  always  increase  with  increasing  knowledge" 

(p.285) . 

4.  The  longer  the  delay  in  giving  knowledge  of 
performance,  the  less  effect  the  given  information 
has.  (p.  287) 

Ammons  did  suggest  that  "there  is  an  optimum  delay  for  every 
task  and  every  stage  of  learning"  (p.  289).  If  the 
knowledge  comes  too  soon,  the  performer  may  not  have  time  to 
judge  his  or  her  own  motor  patterns.  In  some  instances, 
therefore,  a slight  delay  of  knowledge  might  allow  the 
performer  to  make  a better  overall  evaluation  of  his  or  her 
performance . 

5.  When  knowledge  of  performance  is  decreased, 
performance  drops.  (p.  290). 

When  KR  is  reduced,  students  will  gradually  modify  or  drop 
the  assigned  task;  as  a result,  performance  quantity  and 
quality  decreases. 

Feedback  alone  does  not  ensure  knowledge 
acquisition.  There  are  other  factors  which  influence  the 
effectiveness  of  feedback.  Yerg  (1980)  reported  that  group 
instruction  of  specific  information  and  then  feedback  to  a 
single  student  on  the  entire  motion  was  positively  related 
to  pupil  achievement,  while  student  simultaneous  practice 
with  the  teacher  talking  and  providing  detailed  verbal 
feedback  was  negatively  related.  She  concluded  that  80%  of 
the  variation  in  results  was  explained  by  the  learners' 
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readiness  for  instruction  and  that  the  appropriateness  of 
teacher  behaviors  depended  not  only  on  the  expected  learning 
outcomes,  but  also  on  the  readiness  of  the  learners  to 
benefit  from  specific  teacher-learner  interactions. 

Several  studies  have  dealt  with  the  effectiveness  of 
corrective  feedback.  Blakemore  (1985)  investigated  the 
effects  of  mastery  learning,  as  proposed  by  Bloom  and  his 
associates,  upon  the  psychomotor  domain.  The  mastery 
learning  techniques  were  defined  as  (a)  the  provision  of 
definite  standards  of  achievement,  (b)  enough  time  to  learn, 
and  (c)  appropriate  corrective  feedback  and  help  for 
unsuccessful  students.  Statistical  analyses  revealed  a 
significant  effect  for  both  the  mastery  group  and  the  non- 
mastery group  after  12  weeks  (although  the  mastery  group  was 
significantly  higher  in  the  midtest) . Although  there  was  no 
significant  difference  between  the  groups,  she  concluded 
that  low-aptitude  students  and  all  females  benefited  from 
the  conditions  provided  by  mastery  learning  methods.  The 
benefit  was  particularly  noted  in  the  positive  attitudes 
about  the  Mastery  class  by  these  students.  Although  the 
corrective  feedback  was  not  compared  separately  from  mastery 
learning  techniques  (a)  and  (b)  (above),  it  was  a very 
viable  component  of  the  experimental  effect  and  thus  has 
implications  for  skill  learning. 
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While  lending  support  to  the  concept  that  corrective 
feedback  may  increase  student  achievement,  some  studies  have 
noted  a difference  in  effect  for  male  and  for  female 
subjects.  Cox  (1983)  studied  the  relationship  between 
arousal  and  the  performance  and  learning  of  a pursuit  rotor 
task.  Subjects  were  divided  into  three  groups  for  the 
study.  The  control  group  received  no  feedback,  negative  or 
positive;  the  second  group  received  failure-feedback  (the 
subjects  were  told  they  were  performing  very  poorly  whether 
or  not  they  were) ; and  the  third  group  received  an  electric 
shock  supposedly  when  their  performance  was  considered  poor 
(the  shock  was  given  indiscriminantly) . Data  indicated  that 
males  were  not  affected  by  any  of  the  treatments  and  that 
females  were  affected  only  by  the  failure-feedback,  and  that 
effect  was  a negative  one.  Cox  felt  there  were  two  reasons 
for  this  result.  First,  the  ego-involving  instructions 
involved  in  failure-feedback  caused  greater  levels  of 
anxiety  than  no  feedback  or  the  threat  of  a physical  shock. 
Second,  since  no  difference  was  found  in  the  males,  failure 
feedback  undermined  the  self-confidence  of  the  females  more 
than  of  the  males  in  the  study.  He  referred  to  Lenney 1 s 
(1977)  statement  that  "women's  self-confidence  is  often 
lower  than  men's  when  there  are  suggestions  that  their  work 
will  be  compared  with  others'"  (p.  227).  McCaughan  and 
McKinlay  (1981),  in  their  study  of  female  high  school 
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students,  also  confirmed  the  value  of  success-  versus 
failure-feedback  and  further  demonstrated  that  tangible 
awards  yielded  no  change. 

This  may  lead  some  to  assume  that  corrective 
feedback  is  negative  feedback,  but  Siedentop  (1983)  stated 
that  corrective  feedback  is  not  necessarily  negative;  it  can 
be  delivered  in  a positive  manner.  He  did  warn  the  reader 
that,  "If,  over  a period  of  time,  a teacher  reacts 
exclusively  to  errors  in  student  performance,  an  error- 
centered  climate  is  created"  (p.  197).  It  appears,  though, 
that  corrective  judgments  may  not  be  used  extensively  by 
teachers.  Tobey  (1974)  found  that  a large  proportion  of 
teacher  judgments  contained  no  appraisals  at  all.  He  did 
discover  that  the  more  experienced  the  teachers  became,  the 
more  they  made  judgments  about  performance.  He  also 
discovered  that  larger  classes  received  less  evaluative 
feedback  when  compared  to  medium-  or  small-sized  classes. 

Another  type  of  feedback  that  might  negatively 
affect  the  benefit  a student  may  receive  from  a teacher 
statement  is  harshness  or  roughness  in  the  teacher's  tone  of 
voice.  In  a comprehensive  study  of  student  desist 
techniques,  Kounin  (1970)  reported  that  rough  comments  made 
by  teachers,  in  the  process  of  managing  student  behavior, 
consistently  had  adverse  reactions  in  the  classroom.  Harsh 
desist  behaviors  resulted  in  student  discomfort  and  less 
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concern  with  school  matters.  This  negative  effect  held  true 
for  all  grade  levels,  kindergarten  through  college.  Simple 
reprimands  or  alternative  desist  methods  led  to  greater 
student  attention  to  the  lesson.  The  implication  is  that 
harshness  in  the  teacher's  tone  of  voice,  while  making  a 
feedback  statement,  might  prevent  the  student  from  attending 
to  the  instructional  process. 

Although  the  positive  value  of  feedback  is  well 
established,  one  might  ask  if  there  is  a time  in  the 
learning  process  when  feedback  is  of  greater  value  in 
improving  skill,  or,  conversely,  a time  when  feedback  might 
impede  the  learning  task.  Both  Marteniuk  (1976)  and  Reeve 
and  Magi 11  (1981)  conducted  investigations  related  to  this 
question.  They  found  that  greater  attention  to  movement  is 
required  as  the  performer  initially  learns  a motor  skill  and 
as  the  skill  level  improves  the  performer  is  required  to 
devote  less  conscious  attention  to  movement.  Marteniuk 
suggested,  therefore,  that  the  teacher  limit  talking  or 
feedback  during  the  initial  phases  of  motor  skill  learning 
by  the  students. 

Techniques  which  provide  students  their  own  feedback 
such  as  public  display  of  the  number  of  laps  completed 
(Rushall  & Pettinger,  1969)  were  effective.  This  could  be 
because  such  techniques  provide  students  with  a form  of 
self-direction  or  goal-setting,  an  effective  teaching 
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technique  for  keeping  students  on-task  (Graham  & Heimerer, 
1981 ) . 

Nixon  and  Locke  (1973)  found  that  if  a student 
studied  the  incorrect  performance  of  another  performer,  it 
could  help  improve  the  performance  of  the  observer.  This 
finding  is  consistent  with  Levin's  (1985)  findings  on  peer 
tutoring  in  which  he  stated  that  both  students  (the  student- 
teacher  and  the  student-learner)  experience  an  increase  in 
academic  achievement  as  a result  of  the  peer  teaching 
process . 

The  value  of  specific  academic  praise  has  been  well 
documented  in  the  body  of  literature  on  effective  teaching. 
Through  their  meta-analysis  of  studies  on  the  effects  of 
reinforcement  on  classroom  learning,  Lysakowski  and  Walberg 
(1981)  ascertained  that  if  praise  is  appropriately  used  so 
that  its  reinforcement  power  is  optimum,  it  can  account  for 
a 30-percentile  difference  in  scoring  on  learning  outcomes. 
Martens,  Burwitz,  and  Newell  (1972)  concluded  from  their 
survey  that  both  positive  social  reinforcements  (e.g., 
praise  and  smiles)  and  tangible  reinforcements  (e.g.,  money 
and  candy)  have  been  effective  in  changing  the  response  rate 
of  subjects  performing  motor  tasks  which  emphasized  speed  or 
the  quantity  of  responses.  They  showed  that  while  social 
and  tangible  reinforcements  have  no  effect  on  the  early 
practice  trials  of  a qualitative  motor  task,  they  do 
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facilitate  performance  after  the  skill  is  learned.  They 
commented  that  although  social  reinforcement  may  not  play  a 
significant  role  in  changing  responses  on  qualitative  motor 
tasks,  it  may  serve  the  important  function  of  keeping  the 
individual  at  the  task.  "A  little  praise  may  go  a long  way 
towards  maintaining  the  individual's  behavior  rather  than 
have  him  abandon  the  task"  (p.  441).  Another  important 
conclusion  from  this  study  was  that  social  reinforcement  may 
play  an  important  part  in  maintaining  a warm,  convivial 
classroom  environment. 

Another  area  which  has  been  reported  as  affecting 
student  achievement  is  the  expectation  level  of  the  teacher 
or  administrator.  Graham  and  Heimerer  (1981),  in  their 
research  on  the  effect  of  teacher  expectations,  reported 
that  achievement  was  higher  when  the  teacher  expected 
quality  movement  from  the  students  and  not  simply  good 
attempts.  Because  of  the  expectation  level  stated  by  the 
teacher,  a climate  which  communicates  the  following  message 
to  the  students  is  created:  "I  expect  you  to  work  hard 

because  I know  you  can  learn  what  is  expected  of  you" 

(p.  19).  Teachers  who  are  high  in  expectancy  also  tend  to 
communicate  higher  performance  expectations  by  assigning 
more  homework  and  moving  through  the  curriculum  at  a faster 
rate  than  low-expectant  teachers.  When  working  with  low 
socioeconomic  students,  more  effective  teachers  also  have 
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high  expectations.  They  tend  to  be  more  patient  and 
encouraging;  and  they  demonstrate  a greater  willingness  to 
develop  personal  rapport  with  the  students  and  to  teach  and 
reteach. 

What  are  the  strategies,  other  than  an  oral 
statement,  that  a teacher  uses  to  communicate  a high  level 
of  expectation  to  students?  Martinek  (1981),  in  proposing  a 
model  for  the  communication  of  teacher  expectations  in 
physical  education,  stated  that  the  research  indicated 
teacher  expectations  not  only  determine  the  performance  of 
certain  types  of  students,  but  also  serve  to  sustain  low  and 
high  levels  of  performance  since  teachers  consciously  or 
unconsciously  exhibit  preferential  behavior  toward  some 
students  in  their  classes  as  a result  of  certain 
expectations  they  hold.  This  preferential  behavior  can  be 
either  negative  or  positive  and  can  be  communicated  in  a 
number  of  ways.  Positive  forms  of  nonverbal  behavior  may 
include  a nod,  wink,  smile,  or  a pat  on  the  back;  other  less 
overt  actions  may  include  use  of  a student's  ideas  in  class, 
selecting  a student  as  squad  or  team  leader,  rotating 
students  so  that  everyone  gets  to  play  all  positions  in 
various  sports  and  activities,  or  simply  allowing  students 
sufficient  time  to  respond  to  a particular  direction  or 


question. 
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Martinek  (1981)  cited  Crowe's  study  into  the  effects 
of  teacher  behavior  towards  students  (behavior  as  an  action 
springing  from  teacher  expectation) . Crowe  used  four 
variables  from  the  Brophy  and  Good  dyadic  interaction 
analysis  system  ( 1970 ) --climate , feedback,  output, 
input — and  added  a fifth  of  her  own,  touch.  The  results  of 
the  study  showed  that  high-expectancy  students  (a)  received 
more  praise  and  encouragement,  (b)  received  greater 
acceptance  of  their  ideas,  (c)  had  more  teacher  contact, 

(d)  received  greater  pause  time,  and  (e)  had  more  response 
opportunities.  Crowe  found  no  significant  differences 
between  high  and  low  achievers  in  terms  of  the  type  of  new 
material  taught  or  the  frequency  with  which  the  teachers 
touched  their  students. 

Negative  behaviors,  Martinek  (1981)  reported, 
included  ignoring  the  student,  or  their  efforts,  or 
withholding  meaningful  feedback.  Consequently,  said 
Martinek, 

These  forms  of  communication  become  prophetic,  so 
that  students  perform  in  accordance  with  their 
perceived  expectations  of  the  teacher.  Because  of 
this,  the  teacher  is  considered  to  be  a major  factor 
in  determining  what  makes  the  teaching  act  either 
fruitful  or  subversive  for  the  student.  (p.  59) 

Similar  findings  were  reported  by  Martinek  and 
Johnson  (1979),  who  investigated  the  effects  of  teacher 
expectations  on  specific  teacher-student  behaviors  during 
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elementary  physical  education  instruction.  The  results  of 
their  data  analysis  showed  that  the  high  expectancy  group 
received  significantly  more  encouragement,  acceptance  of 
ideas,  and  analytic-type  questions  (p  < .01  for  all  three 
categories).  They  also  found  that  in  three  of  the  five 
classes  studied,  expected  high  achievers  (as  indicated 
before  the  experimental  treatment)  were  significantly  higher 
in  self-concept  than  the  low  achievers,  indicating  a 
possible  cause-and-ef f ect  relationship  between  communicative 
variables  and  psychological  growth. 

There  is  a danger,  nevertheless,  in  a teacher 
setting  expectation  levels.  It  is  possible  for  a teacher  to 
establish  levels  of  expectation  based  on  criteria  which  are 
not  valid.  One  difference  may  reveal  itself  in  a gender 
difference.  Crowe  (cited  by  Martinek,  1981),  from  her  study 
of  junior  high  school  students,  ascertained  that  both  male 
and  female  teachers  tended  to  expect  better  physical 
performance  from  boys  than  girls  during  physical  education 
instruction.  This  finding  must  be  generalized  with  care 
since  Martinek  and  Johnson  (1979)  found  that  student  gender 
had  little  effect  on  teachers'  expectations  for  elementary 
age  children  (fourth  and  fifth  graders). 

Babad,  Inbar,  and  Rosenthal  (1982a)  directed  their 
attention  to  the  relationships  of  teachers  and  students. 
Three  groups  of  students  were  selected  for  their  study: 
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(a)  students  the  teachers  felt  had  unusually  high  potential 
for  physical  performance;  (b)  students  the  teacher  felt  had 
unusually  low  potential  for  physical  performance;  and 
(c)  students,  selected  at  random,  who  were  alleged  (by  the 
researchers)  to  show  potential  for  unusually  good  physical 
performance.  After  taking  a test  designed  to  predict 
susceptibility  to  bias  ( Babad , 1979),  the  teachers  in  the 

study  were  separated  into  high-bias  and  low-bias  groups  in 
order  to  compare  both  the  positive  and  negative  effects  of 
teacher  expectancy  and  to  trace  differential  expectancy 
effects.  The  teachers  were  observed  on  four  behavioral 
variables:  nondogmatic,  responsive,  criticizing,  and 

friendly.  Data  indicated  that  the  low-bias  teachers  had 
relatively  uniform  behaviors  toward  all  three  groups  of 
students,  while  the  high-bias  teachers  behaved  much  more 
dogmatically  toward  the  students  they  perceived  to  be  of  low 
potential  and  also  manifested  more  overall  dogmatic  behavior 
than  low-bias  teachers  (p  < .001).  Similar  results  were 
found  in  responsive,  criticizing,  and  friendly  behavior. 

In  looking  at  the  effect  of  high-  and  low-bias 
teachers'  expectations  on  the  students'  physical  performance 
scores  on  three  skill-related  fitness  tests  (standing  long 
jump,  shuttle  run,  and  either  sit-ups  for  girls  or  push-ups 
for  boys),  the  data  indicated  the  same  type  of  results  as 
had  been  achieved  on  behavioral  variables  (i.e.. 


the 
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unbiased  teachers  obtained  similar  performance  from  all 
three  groups,  while  the  high-bias  teachers  tended  to  obtain 
noticeably  better  athletic  performance  from  students  whom 
they  expected  to  perform  better  and  worse  athletic 
performance  from  students  whom  they  expected  to  perform 
worse) . Students  of  the  low-bias  teachers  scored  in  a 
rather  close  range  (-.05,  -.09,  +.07),  while  the  scores  of 
the  high-bias  teachers  showed  a great  deal  of  variance 
(+.43,  -.57,  +.21). 

In  an  article  which  cited  to  this  and  other  studies 
(Rosenthal  & Babad,  1985)  four  conclusions  were  drawn. 

These  are  as  follows: 

1 . Teachers  tend  to  treat  more  favorably  and  obtain 
superior  performance  from  students  for  whom  they 
have  more  favorable  expectations.  Teachers  tend 
to  treat  less  favorably  and  obtain  inferior 
performance  from  students  for  whom  they  have 
less  favorable  expectations. 

2.  These  effects  of  interpersonal  expectations 

occur  not  only  in  classrooms  . . . but  in 

gymnasiums  as  well.  Athletic  performance  . . . 

can  be  affected  by  others'  expectations. 

3.  The  evidence  suggests  that  these  effects  . . . 

are  brought  about  partly  by  the  ways  in  which 
expecters  treat  their  [students]. 

4.  Not  all  teachers  are  equally  susceptible  to  the 

biasing  effects  of  . . . expectations.  (pp.  38- 

39) 

Babad,  Inbar,  and  Rosenthal  (1982b)  also  dealt  with 
the  question  of  how  teachers  develop  the  expectations  they 
hold  for  certain  students.  They  observed  26  high  school 
physical  education  teachers  as  they  interacted  with  three 
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types  of  students  (i.e.,  expected  low  potential,  expected 
high  potential,  alleged  high  potential,  as  described 
previously) . The  first  finding  of  the  study  was  quite 
dramatic,  indicating  a substantial  difference  between  high- 
bias  teachers  and  low-bias  teachers  in  the  process  of 
establishing  the  three  expectancy  groups.  Low-bias  teachers 
nominated  students  only  according  to  their  grades  in 
physical  education — which  would  appear  to  be  the  most 
appropriate  base  for  a rational,  unbiased  selection.  As  for 
the  high-bias  teachers'  nominations,  five  of  the  six 
characteristics  which  were  measured  in  the  study  were  found 
to  be  significantly  related  to  their  selections,  with 
especially  strong  effects  associated  with  aspects  of  the 
students'  appearance.  The  characteristics  and  levels  of 
significance  for  the  high-bias  teachers  were  socioeconomic 
status  (p  < .001),  quality  of  clothing  (p  < .001),  physical 
attractiveness  (p  < .001),  grade  in  physical  education 
(p  < .001),  ethnic  origin,  and  overall  academic  achievement 
(p  < .001).  In  other  words,  the  high-bias  teacher  was  more 
likely  to  select  a student  as  a potential  high  achiever  if 
the  student  was  of  higher  socioeconomic  status,  wore  better 
clothing,  was  attractive  physically,  and  made  high  grades  in 
physical  education  and  other  subjects. 

No  discussion  of  feedback  would  be  complete  without 
a look  at  the  basis  for  effective  feedback:  skillful  and 
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knowledgeable  observation.  The  teacher  must  be  able  to 
attend  to  all  the  critical  elements  in  a performance  and 
then  communicate  comments  and  suggestions  to  the  student  in 
an  effective  manner.  In  spite  of  consistent  agreement  that 
skillful  observing  is  critical  to  effective  teaching  in 
physical  education,  little  is  actually  known  about  it  as  a 
teaching  skill  (Barrett,  1983).  Barrett  stated  that  the 
skill  of  observing  is  seldom  taught  in  methods  classes,  a 
statement  which  is  interesting  in  a profession  that  has 
designed  exacting  systems  to  be  sure  its  researchers, 
officials,  and  judges  are  skillful  observers.  Barrett  then 
asked , 

Is  teaching  so  different?  Is  it  less  important  than 
judging,  officiating,  or  researching?  . . . 

observing  is  an  integral  part  of  this  discipline 
[teaching]  . . . observing  may  be  the  teaching  skill 

around  which  all  other  skills  depend,  and  then  again 
it  may  not  be.  (p.  29) 

In  her  survey  of  studies,  she  reported  that  at  least  two 
researchers  (Robertson  & Halverson,  cited  by  Barrett,  1983) 
describe  observing  as  one  of  three  components  of  teaching, 
the  other  two  being  interpreting  and  decision-making.  She 
suggested  that,  as  teachers,  we  may  be  poor  observers 
because  our  attention  is  consistently  being  diverted, 
something  which  we  seem  unable  to  control . Barrett  stated 
that  "observing,  as  it  occurs  in  the  teaching  environment, 
is  hypothesized  as  having  three  basic  components:  deciding 
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what  to  observe,  planning  how  to  observe,  and  knowing  what 
factors  influence  the  ability  to  observe"  (p.  23). 
Suggestions  given  by  Barrett  to  improve  observational  skills 
include  selecting  the  specific  movement (s)  to  be  observed, 
knowing  which  features  of  the  movement (s)  are  critical  to 
effective  performance,  and  moving  into  a position  that 
facilitates  effective  viewing  of  those  features. 

Although  effective  observation  appears  to  be  an 
important  teaching  skill,  researchers  have  not  yet  devised 
an  effective  technique  for  observing  a teacher  performing 
this  skill.  Although  assessing  the  correctness  of  a 
teacher's  feedback  statements  might  be  one  way  of  evaluating 
the  effectiveness  of  that  teacher's  skill  in  observing, 
effective  or  ineffective  feedback  by  a teacher  can  be  due  to 
variables  other  than  the  quality  of  teacher  observation 
(e.g.,  quantity  of  teacher  observation,  communication 
ability) . 

In  summary,  studies  of  feedback  show  it  to  be  the 
strongest  and  most  important  variable  affecting  student 
performance  and  learning.  It  is  probable  that  feedback  acts 
as  a type  of  goal  or  objective,  thus  providing  direction  for 
the  student  in  subsequent  activity.  A variety  of  feedback 
techniques  have  proven  to  be  effective:  praise,  film, 

videotape  replay,  posting  scores,  photographs,  and  verbal 
comments.  A good  feedback  statement  will  give  the  performer 
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specific  information  as  to  how  good  or  accurate  the 
performance  was  since  the  level  of  specificity  has  also  been 
shown  to  affect  subsequent  performance  on  a task.  In 
addition,  the  information  must  be  correct,  be  given  in  a 
timely  manner,  and  be  understood  by  the  person  performing 
the  skill.  Corrective  feedback,  although  it  has  been 
demonstrated  to  improve  student  performance,  appears  to 
affect  male  and  female  subjects  differently,  with  females 
performing  less  well  than  males  after  failure-feedback. 
Harshness  or  roughness  in  the  teacher's  tone  of  voice  can 
hinder  the  effectiveness  of  feedback.  Public  display  of 
achievement  has  been  shown  to  be  helpful  since  it  provides 
students  with  a form  of  self-direction  which  has  proved  to 
be  an  effective  teaching  technique  for  keeping  students  on- 
task.  A form  of  peer  tutoring  has  also  been  shown  to  help 
improve  performance. 

The  value  of  specific  academic  praise  has  been  well 
documented  in  a variety  of  studies.  The  use  of  social 
reinforcements  such  as  praise,  smiles,  money,  and  candy  have 
had  a facilitative  effect  on  performance  after  the  initial 
practice  trials.  The  use  of  praise  may  function  best  as  a 
means  of  keeping  students  involved  in  the  task  longer  than 
if  praise  were  not  used. 

The  level  of  expectation  that  a teacher  has  for 
student  achievement  has  been  proven  to  be  prophetic. 
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Students  tend  to  perform  in  accordance  with  their  perception 
of  the  teacher's  expectations.  A teacher  with  a high  level 
of  expectation,  therefore,  can  have  a positive  effect  on 
performance.  In  this  case  a climate  is  created  that 
communicates  the  teacher's  confidence  that  the  students  can 
learn  what  is  presented.  Teachers  who  are  high  in 
expectancy  assign  more  work  and  move  through  instruction 
more  quickly,  but  they  also  develop  personal  rapport  with 
the  students  through  behaviors  such  as  a nod,  wink,  smile, 
or  pat  on  the  back.  They  may  also  use  students'  ideas  in 
class,  use  them  as  leaders,  or  rotate  students  so  that 
everyone  gets  to  play  the  preferred  positions  in  a game  or 
sport.  These  teachers  are  also  willing  to  teach  and  reteach 
until  the  students  perform  well.  Teacher  expectations, 
however,  can  work  against  student  performance.  If  a teacher 
has  judged  that  a student  will  not  perform  well,  he  or  she 
might  ignore  the  student  or  his/her  efforts,  or  withhold 
meaningful  feedback,  thus  making  it  more  difficult  for  the 
student  to  achieve.  Teachers  holding  low  expectations 
sometimes  arrive  at  their  level  of  expectancy  through 
placing  a greater  emphasis  on  aspects  such  as  students 
appearance,  quality  of  clothing,  grades,  and  socioeconomic 
status  rather  than  achievement  in  physical  education  alone. 
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PRINCIPLE:  If  the  teacher  combines 

skilled  observation  techniques  with 
feedback,  achievement  will  be  increased. 

PRINCIPLE:  Student  achievement  will 

be  higher  if  the  teacher  expects  quality 
movement,  not  simply  good  attempts. 

PRINCIPLE:  Teacher  feedback  which 

is  accurate,  complete,  immediate,  and 
specific  will  result  in  more  efficient 
skills  learning.  Feedback  which 
improves  some  form  of  self-direction 
will  enhance  learning. 

PRINCIPLE:  Corrective  feedback  is 

effective  in  student  learning.  Success- 
feedback  may  result  in  greater 
achievement  than  feedback  given  only  in 
failure  situations  (criticism  only),  or 
feedback  given  in  a rough  or  harsh  tone. 
In  failure  situations,  the  use  of 
informational  statements  only,  with  no 
judgment  (including  criticism),  should 
positively  affect  student  learning. 
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Generalizations  Based  on 
the  Review  of  Literature 

It  was  apparent  from  the  number  of  studies  reported 
that  teachers  can,  and  do,  affect  the  achievement  of  their 
students  and  that  teacher  behaviors  can  have  positive  or 
negative  effects  on  student  behavior  and  achievement  in  an 
activity  situation.  It  was  also  apparent  that  the  amount  of 
time  spent  on  skills  instruction  and  practice  had  a strong 
influence  on  the  quality  of  student  achievement.  Data  from 
studies  indicated  that  academic  learning  time-physical 
education  was  affected  primarily  by  the  nature  of  the 
activity,  the  amount  of  activity  time  available,  and  the 
efficient  use  of  activity  time.  It  was  also  ascertained 
that  it  was  the  amount  of  appropriate  student  involvement 
with  the  subject  matter  that  was  positively  related  to 
increases  in  student  learning  in  physical  education  classes 
and  in  the  acquisition  of  motor  skills  and  not  simply  the 
amount  of  time  scheduled  for  physical  education. 

The  need  for  student  time-on-task,  yet  the 
requirement  that  the  time  be  spent  on  appropriate 
activities,  pointed  up  the  need  for  effective  use  of 
specific  teacher  behaviors  which  had  been  shown  to  affect 
the  amount  of  time  students  were  on-task-as-stated  during  a 
lesson.  Siedentop  (1976)  demonstrated  that  the  time  spent 
by  the  teacher  on  managerial  episodes  reduced  the  amount  of 
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time  available  for  activity  time.  He  found  that  the  amount 
of  time  a teacher  spent  on  management  could  easily  be 
reduced  from  over  10  minutes  during  a 35-minute  period,  to  1 
minute  or  less.  The  training  included  a series  of 
strategies  involving  specific  teacher  behaviors  such  as 
beginning  the  class  on  time,  taking  roll  within  1 minute, 
and  using  enthusiasm,  hustle,  and  verbal  reminders  of 
appropriate  behavior.  A few  researchers  emphasized  the 
importance  of  teacher  behavior  in  keeping  students  on-task. 
Data  indicated  that  the  amount  of  active  or  passive 
supervision  engaged  in  by  the  instructor  was  a key  factor  in 
maintaining  student  task  involvement.  Also  reported  was  the 
need  for  teachers  to  have  equipment  set  up  with  a minimum  of 
lost  time,  papers  handed  out  quickly,  and  other  routine 
tasks  done  promptly. 

Based  on  the  research  reviewed  in  the  second 
section,  the  use  of  goals  and  objectives,  reviews  of 
previously  learned  material,  effective  use  of  questions  to 
check  student  comprehension,  and  not  digressing  from  the 
subject  matter  all  contributed  to  the  structure  that  an 
effectively  planned  lesson  must  have.  Although  the  research 
findings  related  to  the  value  of  goals  and  objectives 
demonstrated  some  inconsistency  across  studies,  researchers 
affirmed  that  the  use  of  specific  goals  increased  on-task 
student  behavior  and  the  level  of  achievement.  Review  of 
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subject  matter  at  the  beginning  of  new  lessons,  and  at 
monthly  and  weekly  intervals,  was  demonstrated  to  be  an 
effective  instructional  variable. 

There  was  a question  as  to  how  teacher  verbalization 
(lecture)  could  be  used  most  effectively  in  the  learning  of 
motor  skills.  From  the  studies  it  appeared  that  oral 
guidance  by  the  teacher  could  accelerate  the  learning  of 
motor  skills,  but  that  it  should  not  interfere  with  needed 
practice  time.  Lecture  sessions  should  be  brief  and 
interspersed  between  opportunities  for  motor  responses. 

Research  findings  in  physical  education  indicated 
strong  support  for  the  use  of  demonstrations.  There  were 
specific  teacher  behaviors,  however,  which  could  affect  the 
efficacy  of  a demonstration.  These  behaviors,  among  others, 
included  the  following:  demonstrating  at  the  beginning  of  a 

learning  task,  presenting  correct  demonstrations,  sequencing 
demonstrations  appropriately,  using  verbal  cues,  controlling 
input  cues,  checking  student  comprehension,  and  covering 
safety  points. 

The  data  reported  in  the  section  on  practice  clearly 
indicated  that  practice  was  a key  component  in  learning 
motor  skills,  both  simple  and  complex.  Mohr  (1960),  in 
reporting  on  30  studies,  stated  that  although  specific 
instruction  resulted  in  the  learning  of  skills,  without 
practice  there  was  practically  no  improvement.  It  is 
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important  to  note,  however,  that  learning  was  related  to  the 
number  of  trials  attempted  by  a performer  and  not  simply  to 
the  amount  of  the  time  spent  in  the  activity.  Research  data 
revealed  that  teachers  should  break  up  their  oral 
instructional  time  (talking,  lecturing,  demonstrating)  with 
many  opportunities  for  motor  responses,  thus  limiting  the 
amount  of  verbal  information  that  students  have  to  process 
at  any  one  time  and  increasing  the  number  of  motor  skill 
trials  in  which  students  can  engage.  Practices  were  more 
effective  when  the  teacher  stated  goals  for  the  practice  and 
monitored  the  movement  quality  of  the  practice.  The 
importance  of  task  specificity  of  practice  was  asserted  by 
researchers,  emphasizing  the  need  for  skill  practice  to 
match  the  situation  closely  in  which  the  skill  would  be 
used.  The  use  of  a variety  of  practice  conditions  in  order 
to  reduce  the  possibility  of  error  during  game  play  was 
encouraged  by  some  researchers . 

Feedback  has  been  a popular  subject  of  researchers 
in  physical  education  pedagogy.  Researchers  highlight 
feedback  (also  called  "knowledge  of  results")  as  the 
strongest,  most  important  variable  related  to  performance 
and  learning.  Bilodeau  and  Bilodeau  (1961)  stated  that 
there  was  no  skill  improvement  without  feedback,  progressive 
improvement  with  feedback,  and  a regression  of  skill  when 
feedback  was  no  longer  given.  Five  generalizations  reported 
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by  Ammons  (1956)  pertain  to  the  study  of  effective  teacher 
behaviors.  First,  skill  improvement  would  be  more  rapid  and 
a higher  performance  level  would  result  if  feedback  was 
specific.  Second,  feedback  increased  the  rate  and  level  of 
learning.  Third,  the  motivation  level  of  learners  was 
affected  by  feedback,  usually  in  a positive  manner.  Fourth, 
if  feedback  were  delayed,  it  had  less  effect  than  feedback 
given  in  a timely  manner.  Fifth,  performance  decreased  when 
knowledge  of  performance  was  decreased.  Although  the  type 
of  feedback  that  was  given  could  be  varied,  it  was  important 
that  it  be  accurate,  specific,  and  complete  and  that  the 
student  or  students  understand  it.  The  value  of  success- 
oriented  feedback  rather  than  failure-oriented  feedback  was 
reported.  Researchers  also  found  that  feedback  affected 
males  and  females  differently:  The  self-confidence  of  the 

women  was  found  to  be  negatively  affected  by  failure- 
feedback.  Finally,  the  level  of  expectation  a teacher  had 
for  a student  was  shown  to  be  extremely  influential  in  the 
performance  level  of  that  student.  Student  achievement  was 
higher  when  the  teacher  expected  quality  movement  from  the 
students  and  not  simply  good  attempts.  Teachers  were  found 
to  communicate  their  expectations  through  verbal  forms  of 
communication  and  such  overt,  nonverbal  means  as  nods, 
winks,  smiles,  and  pats  on  the  back.  Less  overt  actions 
included  selecting  a student  for  a leadership  role,  rotating 


107 


students  so  that  everyone  had  an  opportunity  to  play  all 
positions,  or  using  ideas  in  class  that  were  proposed  by 
students . 

Some  of  the  assumptions  upon  which  this  study  is 
based  are  supported  in  the  literature  which  has  been 
reviewed : 

1.  It  is  possible  to  identify  specific  physical 
education  teacher  behaviors. 

2.  Certain  teacher  behavior  affects  student 
achievement . 

3.  Teacher  behavior  can  be  effective  or 
ineffective . 

4.  It  is  appropriate  and  feasible  to  evaluate 
the  performance  of  a teacher  in  terms  of 
specific  instructional  behaviors. 


CHAPTER  III 

REVIEW  OF  THE  LITERATURE  ON 
TRAINING  AND  EVALUATION 
MODELS,  MATERIALS,  AND  METHODS 

The  first  two  problems  in  this  study  were  the 
identification  of  effective  teacher  behaviors  and  the 
development  of  a teacher  observation  instrument  for  use  in 
physical  education  activity  classes.  The  results  of  a 
literature  survey  providing  the  research  base  for  the 
development  of  this  instrument  is  reported  in  Chapter  II. 

The  development  process  for  the  instrument  is  covered  in 
Chapter  IV.  The  third  problem  in  this  study  was  to  design  a 
training  method  so  that  the  instrument  could  be  used 
effectively  in  other  locations  at  other  times.  In  this 
chapter,  the  results  of  the  literature  survey  covering 
training  and  evaluation  methods  are  reported. 

The  first  two  sections  of  this  chapter  contain  a 
survey  of  the  literature  describing  training  models  and 
training  materials.  The  third  section  is  a survey  of 
methods  used  to  evaluate  training.  This  third  section 
begins  with  a discussion  of  the  difference  between  research 
and  evaluation  and  then  describes  various  evaluation 
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techniques  that  can  be  used  to  establish  the  effectiveness 
of  the  training  materials  and  method. 

Training  Models 

The  development  of  consistent  procedures  for 
training  observers  is  a necessary  consideration  in  the 
evolution  of  any  instrument.  No  matter  how  valid  or 
reliable  an  instrument  has  proven  to  be,  it  will  be 
effective  only  if  the  observer  codes  properly  and  uses  the 
instrument  in  the  manner  for  which  it  was  intended  (Martin, 
1977).  In  his  paper  on  the  development  and  use  of  classroom 
observation  instruments,  Martin  reported  that  the 
development  of  standard  procedures  for  training  observer- 
coders  in  the  use  of  a category  observation  instrument  was 
an  important  consideration;  however,  he  indicated  that  the 
development  of  standard  procedures  for  the  training  of 
observer-coders  was  difficult  because  "our  present  knowledge 
in  this  area  is  rather  minuscule"  (p.  52).  Malitz  (1978) 
echoed  this  statement  and  stated  that  there  was  little  in 
print  to  serve  as  a guide  for  a researcher  or  practitioner 
who  wished  to  implement  an  observational  system.  "Details 
seem  especially  lacking  in  how  to  train  coders  to  use  these 
systems  . . . furthermore,  even  where  one  is  able  to  obtain 

'expert'  advice  . . . this  advice  is  [often]  based  upon 

word-of-mouth  or  informal  observation  rather  than  upon 
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empirical  evidence"  (p.  3).  An  ERIC  search  resulted  in  the 
identification  of  a limited  number  of  studies  dealing 
specifically  with  training  methods.  Most  of  those  studies 
were  conducted  for  the  primary  purpose  of  preparing 
observers  to  use  an  observation  instrument  designed  to 
gather  data  for  a specific  study  and,  therefore,  only 
briefly  covered  the  training  procedures  which  were  used. 

Few  researchers  discussed  the  training  method  as  a separate 
entity,  and  even  fewer  discussed  the  methods  used  to 
evaluate  the  effectiveness  of  the  training  method. 

In  developing  a training  method,  a fundamental 
consideration  should  be  the  background  and  previous 
experiences  of  those  being  trained.  What  knowledge  and 
experience  should  observers  bring  to  the  training?  How 
necessary  is  it  that  they  be  knowledgeable  about  teaching? 
Medley,  Coker,  and  Soar  (1984)  indicated  that  accuracy  of 
recording  was  not  dependent  on  the  recorder's  knowledge  of 
pedagogy,  and  that 

It  is  perfectly  feasible  to  train  any  reasonably 
intelligent  person  to  use  an  observation  schedule, 
however  naive  he  may  be  about  the  teaching  process 
. . . and  his  records  may  be  expected  to  be  just  as 

accurate  as  those  of  the  most  experienced  and 
sophisticated  professional  educator.  (p.  80) 

They  stated  that  observers  must  learn  two  important  things, 

"the  definitions  of  the  signs  or  categories  in  the  system 

and  the  cues  by  which  they  can  tell  one  from  another  [and] 
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. . . how  to  discipline  themselves  to  disregard  everything 

else  but  those  cues"  (p.  130). 

The  question  as  to  whether  or  not  there  was  one  best 

method  for  training  was  discussed  by  Medley,  Coker,  and  Soar 

(1984)  who  stated  that  "due  to  the  nature  of  each  different 

observation  system  the  procedure  for  training  observers  for 

each  system  will  involve  materials  and  procedures  unique  to 

that  system"  (p.  130),  a principle  previously  reported  by 

Stake  (1975).  A survey  of  the  literature  on  training 

methods  substantiated  that  principle.  It  revealed  a wide 

variety  of  methods  for  training  participants  to  code  using 

teacher  or  classroom  observation  instruments.  Although 

Martin  (1977)  did  not  detail  specific  training  strategies, 

he  did  express  the  opinion  that,  from  his  own  experience, 

those  training  programs  which  seemed  to  have  been  most 

successful  were  typically  based  upon  the  following: 

To  a greater  or  lesser  extent,  upon  what  might  be 
called  a task  analysis  of  the  activity  of 
sophisticated  observer-coders.  Simply  stated,  this 
procedure  involves  determining  the  subsets  of 
behaviors  which  make  up  the  total  observation- 
recording process,  isolating  these  subsets,  and 
sequencing  their  acquisition  from  simple  to  complex. 

(p.  52) 

In  other  words,  the  developers  of  the  more  successful 
training  methods  determined  the  observer  skills  and 
behaviors  which  made  up  the  total  observation-recording 
process,  isolated  those  behaviors,  and  presented  them  in  an 
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increasingly  complex  sequence.  Becker,  Englemann,  and 
Thomas  (1971)  reported  this  successive  approximation 
strategy  to  be  an  extremely  effective  learning  paradigm. 

Williams  (1973)  surveyed  the  literature  in  an  effort 
to  uncover  principles  for  training  raters  and,  from  her 
survey,  reported  several  effective  procedures.  She  stated 
that  an  initial  step  should  be  to  develop  interest  in  the 
process  and  to  establish  a rationale  for  the  measuring 
procedure.  The  scoring  procedure  should  be  made  as 
objective  as  possible,  thus  leaving  little  room  for 
questions  from  the  raters.  The  categories  developed  on  the 
instrument  should  be  defined  precisely  and  should  not 
overlap.  Knowledge  about  the  categories  should  be  imparted 
by  means  of  a definition  and/or  a rationale  and  examples  of 
typical  actions/situations  that  could  occur  in  each  category 
should  be  included  in  the  training  materials.  Webb  (1980) 
also  emphasized  the  necessity  of  using  examples  in  the 
observer's  manual.  Williams  reported  that  rater  practice  in 
the  use  of  the  scoring  procedure  was  a crucial  aspect  of  the 
training  and  that  discussions  regarding  rating  discrepancies 
should  be  held  in  conjunction  with  practice  sessions.  She 
suggested  that  the  raters  should  be  made  aware  of  the 
various  rating  errors  which  can  result  from  leniency  and 
halo  effects.  To  help  combat  errors  due  to  a halo  effect, 
Williams  directed  the  observers  in  her  study  to  classify  and 
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record  teacher  behaviors  on  the  basis  of  their  surface 
appearance  and  not  to  ascribe  any  hidden  motives  to  them. 

One  of  the  most  comprehensive  reports  covering  the 
components  of  a training  process  was  presented  by  Webb, 
Bentley,  and  Rentz  (1970).  They  described  a 2-year 
inservice  program  designed  to  train  observers  in  the  use  of 
the  Reciprocal  Category  System  (RCS)  for  classroom 
observation.  Their  discussion  dealt  with  three  primary 
issues : 

1 . The  development  of  training  procedures  to 
produce  maximum  proficiency  of  observers  with 
the  most  economical  use  of  time  and  training 
personnel . 

2.  The  development  and  testing  of  training 
materials . 

3 . The  development  of  methods  to  assess  observer 
competency.  (p.  2) 

During  the  evolution  of  the  training  program  for  this  system 
they  identified  nine  aspects  of  training  which  had  been 
evaluated  and  revised  as  necessary.  These  were  listed  under 
three  major  headings:  organization,  training  procedures, 

and  observer  competence.  Organization  included  training 
time,  group  size,  and  training  sequence.  Training 
procedures  included  the  manual  developed  for  use  with  the 
instrument,  course  content,  and  the  trainer.  Observer 
competence  included  the  type  of  instruments  which  were  used 
for  evaluating  observers'  competence  in  the  use  of  the  RCS 


instrument . 
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Training  time  referred  to  the  number  of  hours 
required  for  training  participants.  The  initial  training 
session  was  conducted  for  three  days;  succeeding  training 
sessions  were  reduced  to  two  days  (12  hours).  A comparison 
of  accuracy  scores  indicated  a slight  decrease  in  accuracy 
of  observer  codings  with  the  shorter  training  session  ( 6235 
of  the  coders  as  opposed  to  5035  scored  above  . 60  using 
Scott's  method  for  observer  reliability);  however,  this 
reduction  in  trainee  accuracy  was  not  deemed  sufficient  to 
return  to  the  longer  training  session. 

One  of  the  more  pertinent  findings  from  Webb, 
Bentley,  and  Rentz  (1970)  had  to  do  with  the  size  of  the 
training  group.  They  tried  a variety  of  large  and  small 
group  organizational  patterns  (group  sizes  ranged  from  15  to 
94).  A comparison  of  trainee  performance  in  large  (N  = 94) 
and  small  (N  = 31)  group  training  sessions  indicated  a 
significant  difference  (p  < .05)  in  favor  of  the  large  group 
on  the  written  test  and  a 55.3%  accuracy  score  above  .60  as 
opposed  to  51.635  (nonsignificant)  for  the  small  group. 

Webb,  Bentley,  and  Rentz  only  reported  the  figures  of  two  of 
the  groupings  used  in  the  study — the  two  which  were  the  most 
equivalent.  They  indicated  that  while  the  criteria  favor 
the  large  group,  "it  is  more  important  to  note  that  they  do 
not  favor  small  group  size"  (p.  6). 
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Training  sequences  in  the  Webb,  Bentley,  and  Rentz 
(1970)  study  involved  fixed,  variable,  and  standard 
approaches.  Used  in  early  sessions  was  a fixed  schedule 
which  involved  having  all  participants  doing  the  same 
activities  (sometimes  as  one  large  group  and  other  times  as 
small  skills  groups).  This  method  did  not  prove 
satisfactory  so  a variable  sequence  was  employed.  This  was 
achieved  by  teaming  instructors,  dividing  the  participants 
into  two  groups,  and  having  the  activities  "flip-flopped." 
Subjective  trainer  judgment  and  dissatisfaction  with  this 
approach  led  to  the  third  approach,  the  standard  sequence, 
in  which  each  trainer  conducted  the  entire  set  of  training 
activities  with  a single  group.  Although  no  performance 
scores  were  reported  related  to  the  trainer  sequence,  the 
subjective  judgment  of  the  trainers  indicated  that  they  felt 
the  last  approach  was  the  most  effective. 

The  category  identified  as  training  procedures  by 
Webb,  Bentley,  and  Rentz  (1970)  included  the  instructional 
manual  developed  for  training  observers  in  the  use  of  the 
instrument,  the  content  of  the  training  session,  and  the 
effect  of  a specific  trainer.  The  training  manual  evolved 
through  three  states:  a collection  of  articles  with 

interpretive  comments;  a manual  written  specifically  for 
training;  and,  finally,  a semi-programmed  revision  of  the 
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second  manual.  Refinement  of  the  manual  content  was 
continuing  at  the  time  of  the  report. 

The  content  of  the  training  session  was  initially 
separated  into  two  major,  and  generally  equal,  categories: 
concepts  and  theory  of  the  instrument  and  observation  skill 
and  calculations.  Based  on  the  opinion  of  the  trainers  that 
the  emphasis  should  be  on  application  of  skills,  increased 
stress  was  placed  on  observer  performance  skill  after  the 
first  session. 

The  observer  competence  segment  of  training  included 
selection  of  the  instruments  which  were  used  to  evaluate 
observer  competence.  Four  types  of  instruments  were 
considered  initially  for  assessing  observer  proficiency: 

(a)  a concept  test  which  stressed  understanding  of  the  major 
aspects  of  the  system,  (b)  criterion  audiotapes  which 
measured  the  extent  to  which  a trainee  could  accurately  code 
the  verbal  interaction  depicted  on  the  tapes,  (c)  a skill 
test  which  was  concerned  primarily  with  data  preparation  and 
interpretation,  and  (d)  a final  exam  which  included  content 
covered  by  the  skill  test  and  the  concept  test.  Analysis  of 
data  indicated  that  the  two  most  promising  evaluation 
instruments  were  the  concept  test  and  the  criterion 
audiotape.  During  the  course  of  the  training,  both 
evaluative  instruments  were  revised  several  times.  A 
standard  grading  key  for  the  tape  was  also  developed  from  a 
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composite  of  the  ratings  of  the  five  people  who  had  served 
as  trainers.  At  the  time  of  the  report,  another  criterion 
audiotape  was  being  developed  which  would  better  represent 
all  of  the  categories  in  the  observation  system. 

Other  training  techniques  were  reported  by  Giesen 
and  Sirotnik  (1979)  in  discussing  the  training  of  observers 
for  A Study  of  Schooling  (Goodlad,  1984).  They  used 
homework,  student-written  vignettes,  speed  drills,  slides, 
coding  of  television,  and  videotapes.  Giesen  and  Sirotnik 
stated  that  the  use  of  videotapes  had  been  a particularly 
valuable  method  for  clarifying  code  definitions  and  for 
achieving  agreement  among  observers.  When  observers 
disagreed,  the  tape  was  backed  up  and  replayed  repeatedly 
until  clarification  and  consensus  was  reached.  Coding  of  a 
timed  section  of  the  videotape  and  checking  it  against  the 
trainers ' coding  was  used  as  an  accuracy  check  for  the 
participants  throughout  the  training.  After  completion  of 
each  of  these  checks,  disagreements  and  sequences  of 
interactions  were  discussed  with  individual  observers.  Webb 
and  Brown  (1969),  in  their  training,  also  encouraged  the 
observers  to  achieve  agreement  in  their  responses  and  when 
their  observers  were  faced  with  a behavior  which  resulted  in 
conflicting  opinions  as  to  the  correct  category  for  coding, 
they  were  encouraged  to  record  in  terms  of  the  research 
which  provided  the  theoretical  basis  of  the  instrument. 
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Roderick  (1975)  dealt  with  the  possibility  of 
working  with  protocols  which  did  not  include  videotapes. 

She  found  that  skill  in  applying  an  observation  instrument 
could  be  achieved  by  using  written  or  audiotape  narratives 
which  described  interactions  which  occurred  in  the 
classroom.  Further,  she  determined  that  applying  the 
categories  to  a written  narrative  was  another  way  of  coding 
even  after  categories  were  learned.  Three  of  the  directions 
and  guidelines  used  by  Roderick  in  each  of  her  protocols 
appeared  especially  pertinent  for  the  development  and 
evaluation  of  protocols  for  a study  of  this  nature.  They 
are  as  follows: 

1.  An  initial  statement  which  gives  directions  for 
observing  the  protocol  and  pertinent  information 
about  the  category  focus; 

2.  Counter  numbers  designating  where  to  begin  and 
when  to  pause  when  using  audio  or  videotapes; 

3.  The  correct  answer(s).  (p.  18) 

When  developing  an  instrument  to  be  used  in 
observing  teacher  behavior  during  instruction,  the 
possibility  of  using  actual  classroom  visits  as  part  of  the 
training  process  must  be  considered  since  this  decision  has 
implications  regarding  time,  cost,  and  control  of  training. 
The  Florida  Performance  Measurement  System  training  (Florida 
DOE,  1986b)  was  done  entirely  using  videotapes  of  classrooms 
in  action.  Conversely,  observer  training  for  A Place  Called 


School  ( Goodlad , 1984)  was  done  by  alternating  sessions  of 
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videotape  practice  with  actual  in-class  practice  paired  with 
another  observer.  Webb  (1980)  also  reported  using  a 
combination  of  training  sessions  and  on-site  visits  to 
actual  classrooms.  His  training  session  was  designed  to 
familiarize  the  observers  with  the  system,  to  provide  them 
with  some  experience  in  observing,  and  to  measure 
interobserver  agreement.  At  the  conclusion  of  the  training 
sessions  pairs  of  participants  observed  in  classrooms.  Webb 
reported  that  actual  classrooms  were  used  for  several 
reasons.  First,  the  complexity  of  the  observation  system 
required  that  an  observer  become  aware  of  the  previous 
activities  and  behaviors  of  the  students  being  observed  as 
well  as  their  relationship  with  the  teacher  and  other 
students.  Second,  there  was  not  enough  preparatory  time  and 
equipment  to  make  the  high-quality,  split-screen  videotapes 
which  would  be  necessary  to  accomplish  this  purpose  in  a 
training  session.  Third,  the  ratio  of  trainers  to  trainees, 
1:2,  was  small  enough  to  provide  an  adequate  amount  of  time 
to  discuss  discrepancies  in  the  coding  of  classroom  events. 
Webb  reported  that  a disadvantage  of  conducting  the  training 
in  actual  classrooms,  however,  was  that  the  observers  might 
not  be  exposed  to  all  possible  situations. 

In  their  paper.  Medley  and  Norton  (cited  by  Frick  & 
Semmel,  1978)  concluded  that  training  performed  in  the  field 
to  determine  observer  agreement  before  actual  observational 
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data  collection  should  be  discontinued.  Rather, 

investigators  need  only  document  that  their  observers  were 

competent  upon  completion  of  training — competency  being 

determined  by  evidence  of  nearly  perfect  observer  agreement 

on  unambiguous  examples  of  behavioral  categories  shown  on 

videotape.  Frick  and  Semmel  (1978)  noted  that  the  purpose 

of  a criterion-related  agreement  measure,  such  as  that  used 

at  the  end  of  the  training,  is  to  test  an  observer's 

knowledge  of  the  items  on  the  observation  instrument  and  not 

on  their  classroom  observation. 

Observer  agreement  is  . . . not  synonymous  with 

reliabilities  . . . observers  can  be  trained  to  a 

high  level  of  agreement,  yet  they  can  collect  very 
unreliable  data  if  the  behaviors  of  the  observed 
teachers/pupils  differ  little,  or  if  behaviors  are 
truly  unstable  from  occasion  to  occasion  . . . the 

measurement  will  be  unreliable  regardless  of  the 
extent  of  observer  agreement.  (p.  4) 

In  summary,  the  studies  reviewed  indicated  that 
several  procedures  for  training  observers  seemed  to  be 
particularly  applicable  to  designing  the  training  for  an 
observation  instrument.  In  the  organization  of  the 
training,  it  appeared  that  one  trainer  conducting  the  entire 
session  with  a single  group  would  be  effective.  To  keep  the 
scoring  procedure  as  objective  as  possible,  the  categories 
on  the  instrument  should  be  clarified  to  minimize 
uncertainty  regarding  the  correct  category  for  tallying 
teacher  behaviors.  The  use  of  a variety  of  examples  should 
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assist  in  this  process  of  clarification.  Coding 
information,  during  the  training  session,  should  be 
presented  in  a sequential  order  from  simple  to  complex. 
Several  researchers  emphasized  the  importance  of  giving  the 
raters  as  much  practice  time  as  possible  since  an  increase 
in  observer  skill  seemed  to  be  more  affected  by  an  increase 
in  the  amount  of  time  allocated  for  practicing  with  the 
instrument  rather  than  discussing  theory.  The  practice 
sessions  should  include  discussions  about  rating 
discrepancies,  and  trainees  should  be  encouraged  to  record 
teacher  behaviors  on  the  basis  of  what  they  actually  see  and 
not  to  infer  any  motives  for  the  behavior.  Suggestions 
should  also  be  given  to  assist  in  combatting  observer  errors 
due  to  factors  such  as  halo  effect  and  observer  fatigue. 
On-site  observations  do  not  appear  to  be  necessary  in 
training  observers  to  use  an  instrument  since  the  purpose  of 
the  training  session  is  to  make  trainees  proficient  in  the 
use  of  the  instrument  itself.  During  the  training  it  is 
important  to  ensure  that  all  indicators  are  covered. 

In  evaluating  the  skill  of  the  observers,  use  of  a 
criterion  tape  is  indicated  not  only  for  the  final 
assessment,  but  as  a benchmark  for  measuring  observer 
progress  during  training.  A videotape  was  used  for  this 
purpose  in  several  studies  and  an  audiotape  was  used  in 
another  study.  Other  studies  found  the  use  of  homework, 
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written  tests,  and  written  narratives  very  helpful  in  the 
learning  and/or  evaluation  process.  The  need  to  provide  the 
correct  answers  for  all  protocols  as  well  as  a need  to 
provide  any  pertinent  information  about  the  focus  of  the 
training  materials  was  emphasized  in  the  research. 

Training  Materials 

Thiagarajan  (1973)  noted  that  a number  of 
observation  systems  are  dependent  on  their  original 
developers  for  training  coders  and  pointed  out  that  a 
training  package  would  facilitate  the  training  of  additional 
observers  by  someone  other  than  the  instrument  developer. 
There  is  a large  variety  of  training  materials  that  could  be 
a part  of  such  a training  package:  training  manual; 

instruments  for  assessing  observer  proficiency  (Webb, 
Bentley,  & Rentz,  1970);  homework  materials,  student-written 
vignettes,  speed  drills,  slides,  and  videotapes  (Giesen  & 
Sirotnik,  1979);  and  written  and/or  audiotape  narratives 
(Roderick,  1975).  Williams  (1973),  in  her  study  on  the 
effects  of  training,  made  three  recommendations  regarding 
the  effectiveness  of  materials  for  the  training  process. 

She  indicated  that  there  should  be  examples  of  acceptable 
and  unacceptable  responses,  practice  in  scoring  sample 
responses,  and  discussions  of  rating  discrepancies. 

Williams  felt  that  trainees  needed  immediate  feedback  and 
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designed  her  materials  with  the  answers  hidden  under  a slip 
of  paper  which  could  be  lifted  by  the  participant  to  check  a 
response.  This  also  ensured  consistency  since  all  raters 
were  exposed  to  the  same  information.  She  was  concerned 
that  if  the  training  were  conducted  with  groups  of  raters 
and  the  information  given  out  only  by  the  group  leader,  the 
specific  information  would  be  contingent  on  the  nature  of 
the  group  and  of  the  leader. 

Materials  which  are  developed  for  training  should  be 
designed  with  high  standards  of  excellence  in  mind  for  the 
most  effective  use.  Hanson,  Bailey,  and  Monteiro  (1971) 
suggested  that  in  developing  scripts  and  materials  for  use 
during  actual  lessons  these  three  guidelines  be  kept  in 
mind: 

1.  To  include  all  behaviors  to  be  observed. 

2.  To  introduce  the  behaviors  slowly,  progressing 
from  less  to  more  complex  behavior  sequences. 

3.  To  provide  much  practice  on  a variety  of 
behavior  sequences.  (p.  7) 

Herbert  and  Attridge  (1975)  developed  criteria  to  be 
used  in  the  development  of  observation  manuals  and  systems. 
Thirty-three  criteria  were  identified  and  sorted  into  three 
main  types:  identifying,  validity,  and  practicality. 

Identifying  criteria  enable  users  to  select  the  correct 
instrument  for  their  purposes.  Validity  criteria  relate  to 
the  accuracy  with  which  the  instrument  represents  the 
observed  events.  Practicality  criteria  provide  information 
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about  the  ease  of  administration  and  dissemination  of 
results.  A summary  of  their  criteria  (using  their  number 
system)  follows: 


Identifying  Criteria 

1.  Identifying  criteria  contain  information 
which  enables  the  user  to  decide  if  a specific 
instrument  is  appropriate  for  his  purpose  and 
application.  They  include  the  following: 

1 . 1 A title. 

1.2  A statement  of  purpose. 

1.3  The  support  underlying  the  instrument. 

1.4  The  behaviors  on  which  the  instrument  is 
focused . 

1.5  The  applications  for  which  the  instrument 
is  intended. 

1.6  Any  situations  for  which  the  instrument 
should  not  be  used.  (Herbert  & Attridge, 
1975,  pp.  4-6) 


Validity  Criteria 


Validity  criteria  include  inference,  context,  and 


reliability.  They  includes  the  following: 


2 . 1 


Item  specification 

2.11  All  items  should  be  as  clearly  and 
unambiguously  defined  as  possible. 

2.12  The  definitions  must  be  consistent 
with  their  use  in  the  theory  which 
they  represent. 

2.13  Items  comprising  the  instrument  must 
be  exhaustive  of  the  dimension(s)  of 
behavior  under  study. 

2.14  Items  must  be  representative  of  the 
dimension(s)  being  studied. 

2.15  There  must  be  no  overlap  of 
behaviors,  they  must  be  mutually 
exclusive . 

2.16  Ground  rules  for  the  implementation 
of  the  instrument  and  for  the 
categorization  of  borderline  and/or 
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unusual  behaviors  must  be  specified. 
(Herbert  & Attridge,  1975,  pp . 7-9) 

2.1  Inference 

2.21  Instrument  items  must  be  as  low  in 
the  degree  of  observer  inference 
required  as  possible  taking  into 
account  the  complexity  of  the 
teacher  behavior  under  study . 

2.22  The  nature  and  extent  of  observer 
inference  and  methods  of  reducing 
and/or  controlling  it  must  be 
explicated . 

2.23  The  nature  of  inferences  that  can  be 
made  must  be  carefully  described  and 
substantiated . 

2.24  The  method  of  inferential  treatment 
should  be  specified.  (pp.  10-12) 

2.3  Context 

2.31  The  problem  of  context  (i.e., 
physical,  social,  behavioral, 
temporal  surroundings)  must  be 
recognized  and  explicated. 

2.32  Methods  of  reducing  and/or 
controlling  use  of  context  must  be 
explicated.  (pp.  12-13) 

2.4  Observer  effect 

2.41  Observer,  and  other  eff ects , should 
be  explicated.  This  is  both  a 
validity  and  practicality  criterion, 
(p.  13) 

2.5  Reliability 

2.51  The  types  of  reliability  assessed, 
their  meaning,  and  the  conditions 
under  which  they  were  determined 
must  be  reported.  (p.  14) 

2.6  Validity  procedures 

2.61  Each  instrument  should  be 

accompanied  by  the  methods  employed 
to  test  its  validity,  the  results 
obtained,  and  the  purpose  for  which 
these  results  apply.  (p.  15) 


Practicality  Criteria 

These  criteria  are  concerned  with  the  ease  of 
implementation  of  a given  system,  its  acceptability  to  those 
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under  study,  the  complexity  of  the  data-gather ing  mechanisms 
required,  the  training  procedures  entailed,  etc. 

3.1  Instrument  items 

3.11  Items  comprising  the  instrument 
should  be  relevant  to  its  purpose. 

3.12  Codes  identifying  categories  should  be 
simple,  easy  to  remember,  and  convenient 
to  record. 

3.13  Categories  and  their  codes  should  be 
capable  of  being  easily  learned  by 
observers.  (Herbert  & Attridge,  1971, 
p.  16) 

3.2  Observers 

3.21  Where  special  qualifications  of 
observers  are  required,  these  should 
be  made  clear. 

3.22  Training  procedures  for  observers, 
including  number  of  observers, 
procedural  steps,  duration,  and 
results  must  accompany  the 
instrument.  Necessary  manuals, 
tapes,  films,  or  other  training 
devices  should  be  easily  available. 

(p.  16) 

3.3  Collection  and  recording  of  data 

3.31  The  manual  should  recommend  the 
number,  location,  and  functions  of 
observers  and  other  staff  needed  in 
the  observation  setting  and 
elsewhere . 

3.32  Data  collection  and  recording 
procedures  must  accompany  the 
instrument . 

3.33  The  observation  unit  recommended  by 
the  system  must  be  specified. 

3.34  The  coding  unit  recommended  by  the 
system  must  be  specified. 

3.35  Procedures  for  analyzing  data  should 
be  described  and  discussed. 

3.36  Recommended  data  transmission  and 
display  techniques  for  an  instrument 
should  be  described. 

Costs  likely  to  be  incurred  in  the 
use  of  the  instrument  should  be 
estimated.  (pp.  17-18) 


3 . 37 
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A training  manual  and  training  sessions  meeting 
these  criteria  should  give  potential  observers  a thorough 
knowledge  of  the  indicators  and  categories  on  the  instrument 
with  their  theoretical  underpinnings  and  guidance  for  making 
appropriate  inferences  about  teacher  behavior. 

Evaluation  Design 

To  assess  the  effectiveness  of  a training  program, 
evaluation  instruments  have  to  be  developed.  Since  an 
evaluation  of  training  effectiveness  may  incorporate  several 
evaluation  models,  thus  becoming  hybrid  (Semple,  1974),  the 
background  of  the  field  of  evaluation  is  reviewed  here  to 
identify  criteria  that  should  be  used  for  the  selection  and 
development  of  the  specific  measurement  tools  chosen  to 
assess  the  effectiveness  of  the  training  method  developed 
for  this  study. 

Although  evaluation  techniques  were  used  as  early  as 
2000  B.C.  in  China,  it  was  not  until  the  mid-20th  century 
that  evaluation  "took  on"  an  independent  identity,  one  that 
began  to  differentiate  it  from  the  field  of  research. 
According  to  Worthen  and  Sanders  (1973),  the  main 
differences  between  the  two  fields  are  as  shown  in  Table  1. 

Basically,  stated  Worthen  and  Sanders  (1973), 
research  is  "the  activity  aimed  at  obtaining  general izable 
knowledge  by  contriving  and  testing  claims  about 


Table  1 

Differences  between  the  Fields  of  Research  and  Evaluation 


Research  Evaluation 


1 . MOTIVATION  OF  INQUIRER 

Satisfies  curiosity  Solves  practical  problems 

2.  OBJECTIVE  OF  SEARCH 

Seeks  conclusions  Leads  to  decisions 

3.  LAWS  VS.  DESCRIPTIONS 

Quest  for  laws  Seeks  to  describe 

4.  ROLE  OF  EXPLANATION 

Seeks  scientific  laws  Full  explanation 

unnecessary 

5.  AUTONOMY  OF  INQUIRY 

Sets  own  problems  Undertaken  for  client 

6.  PROPERTIES  OF  PHENOMENA 
WHICH  ARE  ASSESSED 

Scientific  truth  Assesses  "worth" 

7.  GENERAL IZAB I LITY 

Wide  application  Local  interest 

8.  SALIENCE  OF  VALUE  QUESTION 

Not  important  Point  of  the  study 

9.  INVESTIGATIVE  TECHNIQUES 

Must  answer  all  Something  is  causing  the 

questions  difference 

10.  CRITERIA  FOR  JUDGING 
ACTIVITY 

Internal/external  Credibility 

validity 
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Table  1 — Continued 


Research 

Evaluation 

11.  DISCIPLINARY  BASE 

Standard  research 

Uses  a wide  range  of 

techniques 

techniques 

12.  TRAINING 

Thorough  mastery  of 

Sample  of  several 

traditional  social 
science  disciplines 

disciplines 

Source:  Worthen  and  Sanders  (1973,  pp . 27-35) 
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relationships  among  variables  or  by  describing  general izable 
phenomena  . . . these  claims  may  or  may  not  have  immediate 

application"  (p.  19),  while  evaluation  "is  the  determination 

of  the  worth  of  a thing"  (p.  19).  Evaluation  has  often  been 
considered  merely  a form  of  applied  research  which  focuses 
only  on  one  curriculum,  one  program,  or  one  lesson;  this 
view  was  rejected  by  Worthen  and  Sanders.  They  stated  that, 
while  applied  research  is  aimed  at  producing  knowledge  which 
could  be  relevant  to  a problem  having  general  application  in 
education,  evaluation  is  more  narrowly  focused,  collecting 
information  relevant  to  a specific  problem,  program,  or 
product.  Both  the  fields  of  research  and  evaluation, 
nevertheless,  are  considered  as  disciplined  inquiry,  the 
central  attitude  of  which  was  defined  by  Cronbach  and  Suppes 
(1969)  as  that  "which  places  a premium  on  objectivity  and 
evidential  test"  (pp.  15-16).  As  can  be  seen  from  Worthen 
and  Sander's  list  of  differences  between  research  and 
evaluation  (Table  1),  the  criteria  for  judging  a research 
study  are  limited  to  internal  and  external  validity  factors, 
while  evaluation  studies  can  use  any  data-gathering 
techniques  that  appear  credible,  including  those  from 
classical  research.  For  example,  Cronbach  (1963)  pointed 
out  that  it  was  appropriate  to  use  attitude  measures  as  well 
as  outcome  measures  in  evaluation.  Attitude  measures  refer 
to  changes  observed  in  pupils  and  are  usually  measured  by 
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direct  or  indirect  questioning,  interviews,  or 
questionnaires  (Worthen  & Sanders,  1973).  Scriven  (cited  in 
Worthen  & Sanders,  1976,  pp . . 60-104)  supported  this  view  by 
emphasizing  that  evaluators  should  be  concerned  with  process 
as  well  as  outcome  (i.e.,  achievement). 

Frameworks  for  evaluation  studies  encompass  three 
strategy  roles:  judgmental,  decision-management,  and 

decision-objectives  (Worthen  & Sanders,  1973).  Decision- 
objective  models  by  Tyler,  Hammond,  and  Provus  appeared  to 
provide  the  most  appropriate  methods  for  evaluating  the 
effectiveness  of  a training  component  since  the  stated 
purpose  of  such  decision-objective  models  is  to  determine 
the  extent  to  which  the  purposes  of  a learning  activity  are 
actually  being  realized.  Also  inherent  in  these  models  is 
the  presence  of  feedback  on  goal  achievement  for  program 
(and  product)  modification  and  an  emphasis  on  the  importance 
of  looking  at  many  factors  when  making  a judgment.  A brief 
description  of  these  models  follows. 

Tyler  (1942)  defined  evaluation  as  the  comparison  of 
student  performance  with  clearly  specified  objectives.  The 
purpose  and  key  emphases  in  his  model  were  the  specification 
of  objectives  and  the  measurement  of  learning  outcomes  of 
students.  In  using  this  model,  actual  student  performance 
data  will  provide  information  for  the  decision-maker  for 
evaluating  the  strengths  and  weaknesses  of  a course  or 
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curriculum.  Tyler's  model  depends  on  a pre-post  measurement 
of  performance  and  is  structured  around  the  following  major 
steps . 

1.  Establishment  of  broad  goals  or  objectives 

2.  Classifying  the  objectives 

3.  Defining  objectives  in  behavioral  terms 

4.  Finding  situations  in  which  achievement  of 
objectives  can  be  shown 

5.  Developing  or  selecting  measurement 
techniques 

6.  Collecting  student  performance  data 

7.  Comparing  data  with  behaviorally  stated 
objectives 

According  to  Tyler  (1942),  evaluation  should  lead  to  some 
type  of  decision  and  is  a recurring  process.  Evaluation 
feedback  may  be  used  to  reformulate  or  redefine  objectives 
and  information  derived  from  previous  evaluation  studies  may 
be  used  to  develop  plans  further  for  assessment  and 
interpretation.  Modifications  of  the  objectives  and  of  the 
program  being  evaluated  should  result  in  corresponding 
revision  of  the  plan  and  program  of  evaluation. 

Hammond  (cited  by  Worthen  & Sanders,  1973)  also 
assessed  the  effectiveness  of  programs  by  comparing 
behavioral  data  with  objectives.  Although  he  encouraged 
self-evaluation,  Hammond  also  called  for  an  outside 
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evaluator  to  work  as  a consultant  to  local  program  directors 
by  providing  expertise  in  data  collection  and  in  training 
local  evaluators.  He  identified  three  dimensions  to  be 
taken  into  consideration  during  an  evaluation:  institution, 

instruction,  and  behavior.  Hammond  emphasized  the  necessity 
for  empirical  research  and  also  considered  evaluation  to  be 
a continuing  process,  one  that  provides  feedback  on  goal 
achievement  for  program  modification. 

Provus  (1969)  designed  his  model  for  a big  school 
system.  He  emphasized  the  comparison  of  performance  against 
standards  for  the  purpose  of  determining  whether  to  improve, 
maintain,  or  terminate  a program.  He  divided  his  model  into 
four  major  developmental  stages:  (a)  definition, 

(b)  installation,  (c)  process,  and  (d)  product.  In  using 
the  model,  one  moves  through  stages  and  content  categories 
searching  for  discrepancies  and  identifying  standards  to  be 
used  for  future  comparisons.  This  process  of  evaluation  was 
also  designed  to  be  continuous,  allowing  for  program 
improvement  as  well  as  assessment  at  early  stages  or  at  the 
end . 

Although  these  models,  as  described  by  their 
developers,  did  not  lend  themselves  to  direct  application  in 
the  evaluation  of  a training  program  of  limited  scope  and 
duration,  general  principles  appropriate  for  evaluating  the 
training  developed  for  this  study  were  gleaned  from  these 
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models.  The  principles  selected  were  (a)  the  need  for 
continuous  evaluation;  (b)  the  emphasis  on  the  comparison  of 
performance  to  preset  objectives;  (c)  the  need  for  empirical 
data;  and  (d)  the  use  of  evaluative  data  to  improve, 
maintain,  or  terminate  a program. 

Evaluation  models  used  in  a variety  of  training 
situations  and  evaluation  techniques  and  models  used  in 
fields  other  than  education  shed  further  light  on  factors 
which  should  be  considered  when  developing  an  evaluation 
design.  Semple  (1974),  writing  for  the  U.S.  Naval  training 
program,  pointed  out  four  levels  of  evaluation.  The  lowest 
level  was  qualitative.  It  involved  examination  of  content, 
methods,  media,  and  procedures  used  for  training  in  terms  of 
specified  objectives.  Data  sources  included  personal 
observations,  existing  documentation,  and  interviews  with 
training  personnel  and  students.  Only  limited  conclusions 
regarding  the  effectiveness  of  training  could  be  drawn  at 
this  level.  The  next  level  of  evaluation  involved 
noncomparative  performance  measurement.  It  was  the  crudest 
form  of  quantitative  evaluation  employed  and  involved  the 
measurement  of  student  performance  at  the  beginning,  end, 
and  (in  some  cases)  possibly  throughout  training. 
Improvements  in  performance  became  a crude  index  of  training 
effectiveness.  The  third  level  involved  comparative 
measurement  through  greater  control  of  the  training 
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environment  and  the  introduction  of  standardized  exercises 
at  the  beginning,  end,  or  throughout  training.  The  fourth 
level,  transfer  of  training,  was  generally  considered  to  be 
the  ultimate  test  of  the  effectiveness  of  a training 
program.  This  level  involved  comparative  measurement  of 
task  performance  in  a work  situation. 

Semple  (1974)  also  divided  measurement  information 
into  three  categories:  subjective,  quantitative,  and  second 

generation.  He  indicated  that  each  category  could  provide 
valuable  information  for  a training  effectiveness 
evaluation.  Subjective  measurement  referred  to  obtaining 
performance  information  based  upon  individual  and  group 
impressions;  the  methods  used  included  direct  observations, 
interviews,  questionnaires,  and  rating  scales.  Although 
information  derived  in  this  manner  did  not  provide  a firm 
basis  for  making  training  effectiveness  decisions,  it  did 
offer  a unique  capability  for  assessing  user  acceptance  of  a 
training  program — a relatively  important,  but  frequently 
overlooked  factor.  Subjective  measurement  could  also 
produce  meaningful  design  feedback  information,  particularly 
with  respect  to  training  procedures  and  software  design. 

Quantitative  measurement  referred  to  obtaining 
numerically  related  performance  information  by  using 
performance  checklists,  rating  scales,  or  other  instruments 
that  measure  trainee  performance.  Quantitative  measures 
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provided  objective  data  on  performance  which  could  lead  to 
decisions  regarding  the  effectiveness  of  the  training 
process  as  it  related  to  potential  job  performance. 

Second-generation  measures  were  statistical  measures 
which  were  developed  from  first-generation  measures.  Such 
measures  were  used  to  compare  a training  experimental  group 
with  a control  group  and  provide  measures  of  comparison 
(i.e.,  descriptive  statistics)  which  will  be  used  to 
summarize  the  findings  of  the  evaluation  into  easily 
understood  terms,  thus  facilitating  communication.  Basic 
descriptive  statistics  suggested  by  Semple  (1974)  included 
the  mean,  median  and  mode,  standard  deviation,  and  variance. 

Writing  for  the  field  of  business,  Brethower  and 
Rummler  (1979)  pointed  out  that  there  were  four  questions  an 
evaluator  should  seek  to  answer:  (a)  Do  trainees  like  the 

training?  (b)  Do  trainees  learn  from  the  training?  (c)  Do 
trainees  use  what  they  learn?  and  (d)  Does  the  organization 
benefit  from  the  newly  learned  performance?  Their 
evaluation  matrix  listed  potential  items  and  situations 
which  might  be  measured,  measurement  dimensions,  sources  of 
data,  and  alternative  data-gather ing  methods.  Among  a 
variety  of  data-collection  methods  which  were  listed,  the 
following  appeared  pertinent  for  this  study:  trainer 

observation,  participant  interview,  use  of  a questionnaire, 
and  document  review.  In  the  process  of  adapting  their 
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general  systems  model  to  education,  they  emphasized  the 
provision  for  continuous  feedback  during  the  process  of 
development . 

A primary  concern  when  designing  a training  model  is 
the  question  of  when,  during  the  design  process,  one  should 
assess  the  effectiveness  of  the  training.  Semple  (1974) 
suggested,  as  a minimum,  the  beginning  and  end  of  the 
training  period  and  possibly  throughout  training.  Hamblin, 
in  his  model  (1974),  indicated  five  levels  and  times  at 
which  some  type  of  evaluation  might  be  appropriate.  Level  1 
assessments  were  to  discover  the  reactions  of  the 
participants  to  the  training.  The  data  were  used  for  making 
training  programs  more  responsive  to  participant  attitudes 
regarding  the  methods  used  in  the  training  and  for  possibly 
structuring  learning  in  a more  effective  manner  in  the 
future.  Level  2 assessments  were  to  evaluate  the  actual 
learning  which  had  taken  place.  Techniques  such  as  tests 
(objective,  essay,  standardized,  and  tailor-made)  and  skill 
analysis  were  used.  Level  3 assessment  was  focused  on  the 
job  behavior  of  participants  subsequent  to  the  training  in 
an  effort  to  evaluate  the  effectiveness  of  that  training 
based  upon  the  changes  in  job  skill.  Some  of  the  techniques 
used  at  this  level  were  on-the-job  observation,  use  of 
diaries,  activity  sampling,  and  depth  interviews.  Some  of 
these  techniques  were  particularly  useful  for  obtaining  data 
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about  a trainee's  on-the-job  behavior  between  training 
sessions.  This  information  could  then  be  used  for  revising 
objectives  for  the  remainder  of  the  training  sessions.  In 
level  4 assessment  data  on  changes  in  productivity,  labor 
turnover,  organizational  climate,  and  work  flow  were  used  to 
evaluate  the  possible  organizational  effect  of  training 
programs.  In  level  5 such  data  as  cost-benefit  and  human 
resource  accounting  were  used  to  evaluate  the  economic 
effect  of  specific  training  programs. 

Of  the  38  specific  evaluation  techniques  suggested 
by  Hamblin  (1974),  3 seemed  to  be  appropriate  for  this 
study:  end-of-course  or  session  reaction  forms  to  determine 

participant  reactions;  objective  tests;  and  special,  tailor- 
made  techniques  for  evaluating  skill.  Tailor-made 
techniques  were  those  designed  to  match  a specific  training 
situation.  All  three  of  these  techniques  were  from  level  1 
or  level  2,  since  only  techniques  in  these  levels  can  be 
evaluated  during  the  training  session.  Evaluation 
techniques  in  the  other  three  levels — job  behavior, 
organization,  and  ultimate  value--could  only  be  used  after 
the  training  session  is  completed  and  the  trainees  are  back 
on  the  job  and  were,  therefore,  inappropriate  for  this 
study . 

The  developers  of  any  training  method  for  an 
observation  instrument  must  make  a decision  as  to  which 
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criteria  will  be  used  as  a standard  for  effective 
observation  performance.  The  two  standards  generally  used 
have  been  average  reliability  (averaging  the  tallies  of 
selected  observers  and  using  that  average  as  the  criterion 
of  correctness)  and  criterion  instrument  accuracy  score 
(codings  made  by  an  expert  in  the  system)  (Malitz,  1978). 

As  a measure  of  coder  performance,  average  reliability 
suffers  from  the  fact  that  it  is  not  compared  to  an 
objective  standard  of  correct  coding;  instead,  it  defines 
the  best  coder  as  the  one  who  achieves  the  highest 
reliability  with  all  other  coders.  According  to  Malitz,  a 
more  generally  accepted  index  of  a coder's  performance  is 
"that  coder's  degree  of  agreement  with  an  expert  coder  on  a 
sample  of  classroom  behavior"  (p.  5).  Malitz  conducted  a 
study  to  discover  some  evidence  relating  to  the  validity  of 
average  reliability  as  a measure  of  coder  performance, 
compared  to  using  a criterion  instrument  score.  His  study 
was  designed  so  that  some  expert  codings  were  calculated. 
This  made  it  possible,  using  selected  videotapes,  to  compute 
for  each  coder  both  the  average  reliability  and  the  degree 
of  agreement  with  an  expert.  The  videotapes  used  for  this 
experiment  were  made  in  traditionally  structured,  public 
school  fifth-  and  seventh-grade  classrooms.  The  tapes  were 
all  25-30  minutes  in  length,  the  length  of  an  average  lesson 
and  the  teachers  were  encouraged  to  be  as  natural  as 
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possible.  Malitz  found  that  each  coder's  average 
reliability  on  all  three  average  reliability  checks 
correlated  significantly  (p  < .001)  with  his  or  her 
reliability  when  compared  to  the  expert's  codes.  Malitz 's 
results  indicated  that  either  method  of  evaluating  observer 
performance  yielded  a valid  prediction  of  probable  future 
success  as  an  observer. 

The  use  of  a criterion  tape  accuracy  score  may  not 
fully  assure  observer  accuracy.  Norman  Webb  (1980),  in  his 
report  on  the  Individually  Guided  Education  (IGE)  project, 
stated  that  using  a criterion  tape  accuracy  score  to  measure 
the  observers'  knowledge  of  the  system  worked  adequately  for 
categories  which  were  frequently  observed;  however,  checking 
accuracy  in  coding  behaviors  that  had  a low  frequency  of 
occurrence  on  criterion  tapes  usually  led  to  low 
interobserver  agreement.  These  categories  were,  therefore, 
given  more  emphasis  on  succeeding  training.  The  implication 
is  that  criterion  tapes  need  to  contain  a sufficient  number 
of  all  behaviors  one  intends  observers  to  be  able  to  code. 

Webb,  Bentley,  and  Rentz  (1970)  reported  on  the  use 
of  four  types  of  instruments  to  evaluate  observer 
competence:  (a)  concept  test,  (b)  criterion  tapes, 

(c)  skill  test,  and  (d)  final  exam.  Analyses  of  results 
indicated  that  the  concept  test  and  criterion  tape  accuracy 
scores  were  the  best  indicators  of  observer  competence. 
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Materials  used  in  their  training  included  the  use  of 
prepared  audiotapes  and  videotapes,  live  simulated 
situations,  and  a priori  scripts  of  selected  verbal 
behaviors . 


Generalizations  Based  upon  the 
Review  of  Literature 

An  evaluation  method  appropriate  for  this  study 
could  have  been  derived  from  methods  and/or  models  from  both 
the  field  of  research  and  the  field  of  evaluation.  The 
principal  criteria  which  should  be  used  for  evaluating  the 
methods  selected  are  credibility  and  practicality.  The 
evaluation  design  should  include  data  regarding  both  the 
process  of  training  (opinion)  and  the  outcome  (empirical)  of 
the  training:  both  qualitative  and  quantitative  measures. 

An  evaluator  should  seek  to  discover  if  trainees  liked  the 
training  and  if  they  learned  from  the  training.  Methods 
which  seemed  to  have  particular  application  to  this  training 
package  included  direct  and/or  indirect  questioning,  trainer 
observation,  participant  interviews,  questionnaires,  rating 
scales,  development  of  statistical  measures  leading  to  some 
form  of  descriptive  statistics,  objective  tests,  and  tailor- 
made  techniques  for  evaluating  skill. 

In  measuring  performance,  any  evaluation  method 
which  is  developed  should  include  goals  or  objectives 
defined  in  behavioral  terms,  situations  in  which  the 
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objectives  can  be  shown,  collection  of  performance  data 
during  those  situations,  and  comparison  of  the  data  with 
previously  selected  objectives.  The  two  methods  of 
assessing  observer  competence  that  appeared  to  have  the 
greatest  potential  for  effectiveness  were  an  objectively 
scored  written  test  on  concepts  and  a criterion  tape 
observation  accuracy  score.  Although  the  researchers 
reported  no  significant  difference  between  using  an  expert's 
coding  as  an  objective  standard  of  correct  coding  and  using 
average  reliability,  the  use  of  the  former  method  should 
assist  in  establishing  a consistent  standard  which  would 
facilitate  the  application  of  the  instrument  in  any  locale. 


CHAPTER  IV 

DESIGN  AND  DEVELOPMENT 


Chapter  IV  has  five  sections  covering  the  design  and 
development  of  the  formative  observation  instrument, 
training  method,  training  materials,  the  design  of  the 
study,  and  the  statistical  treatment  selected  for  the  study. 
The  process  used  to  develop  the  formative  instrument  is 
discussed  in  the  first  section.  Incorporated  in  this 
development  process  were  methods  and  suggestions  published 
by  researchers  in  the  field  of  teacher  evaluation.  These 
research  findings  are  reported  briefly  to  establish  a 
rationale  for  the  process  selected.  The  second  section 
contains  a description  of  the  method  designed  for  training 
observers.  The  third  section  covers  the  types  of  materials 
used.  The  design  of  the  study  is  detailed  in  the  fourth 
section,  and  the  chapter  closes  with  a description  of  the 
statistical  treatment  selected  for  the  study. 

Developing  the 

Formative  Observation  Instrument 
Medley,  Coker,  and  Soar  (1984),  in  their  book 
Measurement-Based  Evaluation  of  Teacher  Performance:  An 
Empirical  Approach,  suggested  that  the  first  task  in  the 
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development  of  a formative  instrument  was  to  identify  a set 
of  dimensions  along  which  the  performances  of  competent 
teachers  would  differ  from  those  of  incompetent  ones.  They 
suggested  three  strategies  which  could  be  used  to  choose 
dimensions  to  evaluate.  One  was  to  begin  with  a plausible 
theory  of  teaching  and  derive  specifications  of  teacher 
behavior  from  it.  A second  strategy  was  to  use  whatever 
most  educators  agree  constitutes  effective  teaching.  The 
third  strategy  was  to  "turn  to  the  process-product  research 
and  select  a proven  instrument  [or]  . . . build  an 

instrument  to  measure  those  dimensions  of  teacher  behavior 
that  the  research  has  shown  to  be  related  to  effectiveness" 

(p.  60). 

Medley,  Coker,  and  Soar's  (1984)  third  strategy 
appeared  to  be  the  most  appropriate  model  to  follow  in  the 
development  of  the  observation  instrument  for  this  study. 

The  need  for  a more  effective  observation  instrument  in 
physical  education  was  initially  identified  through  the 
implementation  of  the  Florida  Performance  Measurement  System 
(FPMS),  a system  based  specifically  on  process-product 
research  studies  of  teaching,  when  supervisors  working  with 
physical  education  teachers  found  that  many  teaching 
strategies  which  were  considered  necessary  for  the  teaching 
of  activity  classes  were  not  identified  on  the  formative 
instruments,  particularly  the  one  for  instructional 
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organization  and  development.  Since  the  observation 
instrument  developed  in  this  study  was  intended  to  be  used 
with  the  FPMS,  it  was  logical  to  base  the  instrument  which 
was  developed  for  the  study  on  process-product  research  as 
well . 

A survey  of  the  literature  of  physical  education 
pedagogy  in  a motor  skill  and/or  activity  setting  was 
conducted  to  pinpoint  the  teaching  strategies  which  were 
needed  for  effective  instruction  in  physical  education. 
Initially,  the  plan  was  to  include  only  behaviors  reported 
by  the  physical  education  literature.  However,  several 
teaching  behaviors  which  formed  part  of  the  FPMS,  but  were 
not  indicated  by  any  of  the  physical  education  studies,  were 
included  because  they  had  strong  support  in  the  effective 
teaching  literature  and  appeared  to  be  directly  applicable 
to  the  behaviors  that  an  effective  physical  educator  should 
display  in  an  activity  setting.  Only  studies  which 
demonstrated  some  relationship  between  teacher  behavior  and 
student  learning  were  used,  and  only  teacher  behaviors 
judged  to  be  generic  to  activity  classes  were  included  on 
the  instrument.  Findings  from  the  studies  which  were  used 
indicated  that,  while  some  teacher  behaviors  may  be 
effective  in  increasing  student  achievement,  other  behaviors 
may  show  a negative  relationship  to  achievement;  therefore. 
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provision  was  made  for  coding  of  both  effective  and 
ineffective  behaviors. 

Many  of  the  studies  reviewed  for  the  development  of 
the  original  FPMS  instrument  used  the  scores  of  standardized 
tests  applied  in  a pre-  posttest  model  as  their  dependent 
variable;  however,  such  was  not  possible  for  the  development 
of  this  instrument  because  there  were  no  commonly  accepted 
standardized  tests  in  physical  education  except  in  the  area 
of  fitness.  Instead,  the  physical  education  researchers 
usually  depended  upon  individual,  researcher-designed  tests. 
Some  of  the  indicators  which  were  selected  for  this  study 
came  from  the  findings  of  experimental  and  process-product 
research  studies;  others  came  from  correlational  studies. 

Indicators  of  student  behavior  were  not  included  in 
this  study.  A number  of  researchers  have  indicated, 
however,  that  many,  if  not  most,  student  behaviors  were 
caused  by,  or  could  be  controlled  by,  effective 
instructional  organization  (Florida  DOE,  1983). 

Just  as  important  as  the  selection  of  the  indicators 
for  any  instrument  is  the  description  of  the  teacher 
behavior  included  in  these  indicators.  Medley,  Coker,  and 
Soar  (1984)  affirmed  that  the  most  difficult  task  in 
developing  an  observation  instrument  is  the  definition, 
specifically  and  in  detail,  of  just  what  is  meant  by  each 
dimension  of  teacher  performance  that  will  be  measured.  A 
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further  description  of  the  specificity  which  was  required 
was  expressed  by  Darling-Hammond , Wise,  and  Pease  (1983)  who 
stated  that  evaluation  processes  useful  for  improvement  must 
yield  rich,  descriptive  information  which  points  out  sources 
of  difficulty  as  well  as  acceptable  courses  for  change.  The 
studies  were  carefully  reviewed  to  obtain  a full  description 
of  each  of  the  teacher  behaviors  which  were  indicated  by  a 
particular  study.  Behavioral  examples  for  each  of  those 
behaviors  were  then  developed  and  included  in  a training 
manual.  In  addition,  visual  examples  of  the  indicators  were 
provided  for  in  training  videotapes. 

After  a survey  of  the  physical  education  literature 
revealed  support  for  the  inclusion  of  other  behaviors  on  the 
observation  instrument,  various  categories  and  indicators 
were  added  or  deleted  to  make  the  instrument  more  effective 
for  an  activity  setting.  Martin  (1977)  pointed  out  that  a 
sound  category  observation  instrument  "must  be  objective, 
relevant,  parsimonious,  efficient,  reliable,  and  valid" 

(p.  44).  These  standards  were  used  for  the  selection, 
addition,  and  deletion  of  categories  and  indicators  on  the 
observation  instrument. 

After  specific  behavioral  indicators  were  selected, 
they  were  grouped  into  five  major  categories  with  8-14 
specific  behavioral  indicators  in  each  category.  The 
process  of  categorizing  was  basically  one  of  successive 
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approximation  in  which  numerous  behavioral  descriptions  were 
gradually  channeled  into  a finite  number  of  behavioral 
indicators  which  were  then  grouped  into  categories  for  ease 
of  coding.  Following  Martin's  (1977)  suggestion  that  it  is 
preferable,  during  the  early  development,  to  have  too  many 
rather  than  too  few  categories,  the  first  draft  contained 
more  indicators  than  now  appear  in  the  instrument.  The 
instrument  was  used  on  a trial  basis  for  observation  and 
counseling.  During  the  trial  period  additional  changes  in 
the  instrument  were  indicated.  Two  new  categories  were 
added:  Demonstrations  and  Supervised  Performance.  The 

indicators  in  Supervised  Performance  were  taken  from  the 
research  on  practice.  The  title  Supervised  Performance  was 
chosen  to  indicate  a wide  range  of  applicability  in  activity 
settings  since  any  time  students  are  actively  participating, 
they  are  involved  in  some  type  of  practice.  Extensive 
changes  were  made  in  the  categories  of  Feedback,  Effective 
Use  of  Time,  and  Lesson  Development.  The  instrument 
developed  for  this  study  is  presented  in  Appendix  B. 

The  instrument  was  formatted  in  the  order  in  which 
the  teacher  behaviors  were  most  commonly  observed  during  a 
lesson.  Either  of  the  indicators  Begins  Classwork 
Promptly/Delays  Starting  the  Lesson  was  observed  and  tallied 
at  the  start  of  the  lesson.  Other  indicators  in  the  first 
two  categories  also  were  observed  at  the  beginning  of  a 
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lesson  (e.g.,  Materials  in  Order,  Orients,  Conducts  Review). 
Some  of  the  indicators  in  these  two  categories,  however, 
were  observed  throughout  the  lesson  (e.g.,  Provides 
Activities/Gives  Directions,  Talks  on  Subject  Matter, 
Questions  Students  Comprehension) . Indicators  in  the 
category  of  Demonstrations  (with  the  exception  of  Safety 
Points  Are  Covered)  were  tallied  only  during  demonstrations 
and  since  demonstrations  were  generally  presented  before 
practice,  these  indicators  were  the  next  ones  placed  on  the 
observation  instrument.  After  a demonstration,  students 
were  generally  given  practice  on  the  skill  which  was 
demonstrated,  so  the  category  of  Supervised  Performance  was 
listed  next  on  the  instrument.  The  Feedback  category  was 
placed  last  on  the  instrument  since  feedback  comments  were 
generally  made  while  the  students  were  practicing. 

As  teacher  behaviors  were  revealed  from  research 
data,  the  decision  had  to  be  made  as  to  whether  or  not  these 
teacher  behaviors  would  fit  under  a current  indicator  or 
whether  the  addition  of  a new  indicator  was  necessary.  In 
some  cases,  it  was  possible  to  group  two  behaviors  together 
since  the  teacher  behavior  was  similar.  Decisions  as  to 
which  categories  and  indicators  to  keep  or  discard  were  made 
on  the  basis  of  empirical  testing  in  actual  classes  and 
through  analyzing  videotaped  observations  during  the  trial 
period . 
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One  problem  which  led  to  the  elimination  of  some 
teacher  behaviors  indicated  by  the  research  was  the 
difficulty  of  writing  a definition  for  the  behavior  which 
would  render  it  a low-inference  indicator.  Since  a leading 
factor  negatively  affecting  the  reliability  of  an 
observation  instrument  is  the  use  of  high-inference 
categories  (i.e.,  poorly  defined  or  complex  cues;  Frick  & 
Semmel,  1978),  a concerted  effort  was  made  to  use  low- 
inference  items  on  the  instrument.  In  line  with  the 
findings  of  Medley  and  Norton  (cited  by  Frick  & Semmel, 
1978),  objectivity  was  ensured  by  defining  categories  so 
that  discriminations  were  based  on  relatively  obvious  and 
easily  recognized  cues  and  on  cues  which  were  not  dependent 
on  sophisticated  knowledge  of  physical  education  or  on  the 
observer's  own  set  of  values. 

The  decision  as  to  the  total  number  of  categories 
and  indicators  which  should  be  on  the  instrument  was  a 
critical  one.  There  had  to  be  a sufficient  number  of 
indicators  to  cover  all  the  important  behaviors  which  should 
be  observed  in  an  activity  class.  Martin  (1977)  also 
pointed  out  that  observers  should  be  kept  continuously  busy 
(at  a reasonable  and  consistent  rate)  so  that  their  personal 
inference-making  would  not  interfere  with  the  objective- 
descriptive  process.  According  to  Medley  and  Mitzel  (1963), 
if  observers  have  time  to  code  on  the  basis  of  their  general 
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impressions,  the  data  obtained  from  the  category  observation 
instrument  will  be  distorted  in  terms  of  reliability, 
validity,  or  both.  However,  the  instrument  cannot  contain 
more  indicators  than  can  be  facilely  tallied. 

In  developing  an  observation  instrument,  it  is 
important  to  be  sure  that  those  who  use  the  instrument  have 
all  the  information  which  is  needed  to  use  it  effectively. 
Martin  (1977)  reported  that  the  major  structural  components 
of  a category  observation  instrument  are  (a)  a set  of 
operationally  defined  categories  of  behavior,  (b)  a set  of 
rules  and  priorities  for  observation  and  coding,  (c)  a 
standardized  recording  form  (e.g.,  matrix,  grid),  and  (d)  a 
series  of  instructions  for  organizing  and  analyzing  the 
observational  data.  The  first  two  components,  according  to 
Martin,  are  absolutely  necessary,  whereas  the  final  pair, 
while  increasing  the  overall  utility  of  an  instrument,  may 
be  omitted  at  times.  The  recording  format  chosen  for  the 
instrument  was  the  same  as  that  used  by  the  FPMS  since  the 
primary  purpose  for  developing  the  instrument  was  to  use  it 
in  conjunction  with  the  FPMS  in  the  Florida  Beginning 
Teacher  Program.  The  rules  for  observation  and  coding, 
along  with  examples  of  each  indicator,  were  established  and 
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Included  in  a Coding  Manual1  (sample  introductory  pages  in 
Appendix  C) . The  Coding  Manual  also  contained  information 
about  how  to  analyze  the  behaviors  observed.  A suggested 
postconference  data  summary  sheet  was  also  included. 

One  of  the  decisions  which  has  to  be  made  on  any 
observation  instrument  is  whether  the  behavioral  units 
should  be  coded  in  terms  of  natural  events  or  in  terms  of 
time.  The  advantage  of  time  units  is  that  they  can  be 
consistently  employed  across  a number  of  observers  and  over 
a number  of  categories;  however,  natural  units,  according  to 
Martin  (1977),  produce  results  which  are  easier  to 
interpret.  He  suggested  that  the  number  of  smiles  a teacher 
emits  in  an  hour  class  period  is  probably  more  meaningful 
than  the  number  of  3-second  periods  in  which  his  or  her 
facial  expression  was  predominantly  that  of  smiling.  The 
FPMS  developers  chose  natural  units  rather  than  a time-basis 
or  sign  system  after  analyzing  the  results  of  an  FPMS  study 
comparing  three  different  methods  of  coding  an  observation 
instrument  (Florida  DOE,  1984).  Coding  according  to  natural 
units,  or  frequency  of  occurrence,  therefore,  was  chosen  as 
the  method  of  data  collection  for  this  instrument  since  it 
had  been  used  by  the  FPMS  developers.  When  an  observer 

1Readers  who  are  interested  in  obtaining  the  Coding 
Manual  and  other  training  materials  should  contact  Virginia 
Sharpe,  311  Blue  Lake  Terrace,  DeLand , FL  32724. 
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codes  according  to  frequency  of  occurrence,  a mark  or  tally 
must  be  made  each  time  an  indicated  behavior  occurs.  The 
recommended  display  technique  for  this  instrument  is  that  of 
straight  up  and  down  tally  marks  (i.e.,  IIII).  When  a fifth 
tally  mark  is  made,  the  observer  runs  a diagonal  slash 
through  the  first  four.  Each  grouping  is  then  easily 
identified  as  totaling  five  tallies.  Specific  rules  on  when 
and  how  often  to  tally,  as  well  as  examples  of  teacher 
behavior,  were  included  in  the  Coding  Manual. 

Training  Method 

A group  of  30  educators  from  one  school  district  was 
asked  to  participate  in  the  study.  They  represented  a broad 
spectrum  of  educators:  school-  and  district-level 

administrators,  supervisors,  and  teachers.  A memo  was  sent 
to  local  principals  and  assistant  principals,  district-level 
staff,  and  physical  education  teachers  describing  the 
instrument  and  the  training  process  and  encouraging  them  to 
participate.  The  memo  contained  a "tear-off  slip"  with  a 
request  that  it  be  returned  if  the  person  was  interested  in 
becoming  a part  of  this  study.  Random  selection  from  among 
those  who  returned  their  slip  led  to  the  selection  of  three 
groups  of  10.  Two  of  the  groups  went  through  a training 
program  and  had  their  observer  competence  checked  by 
observing  a criterion  videotape  and  taking  a written 
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indicator  check  (test).  The  first  training  group  went 
through  a program  conducted  by  the  developer  of  the 
instrument.  To  test  the  effectiveness  of  the  training  with 
someone  other  than  the  developer,  the  second  group  of  10 
went  through  the  training  program  under  the  direction  of  one 
of  the  participants  who  trained  during  the  first  program. 
This  resulted  in  a measure  of  effectiveness  under  second- 
level  training.  Both  training  leaders  use  the  same  training 
manual,  materials,  and  process.  A third  group  acted  as  the 
control  group.  The  control  group  used  the  instrument  to 
observe  the  criterion  videotape  and  took  the  written 
indicator  check  without  the  benefit  of  going  through  a 
training  program. 

A week  previous  to  the  training  session  participants 
in  the  two  treatment  groups  were  given  a copy  of  the 
research  covering  the  categories  and  indicators  on  the 
observation  instrument;  the  Coding  Manual,  containing  rules 
for  coding  and  specific  behavioral  examples  of  all 
indicators;  a copy  of  the  observation  instrument;  and  a 
practice  comprehension  check  and  key.  Trainees  were  asked 
to  take  15-20  minutes  a day  for  the  six  days  before  the 
training  session  to  study  the  material.  A typical 
assignment  instructed  participants  to  read  the  research  on 
one  of  the  categories  (e.g.,  Efficient  Use  of  Time),  review 
the  behavioral  indicators  in  that  category  in  the  Coding 
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Manual,  and  find  the  indicators  on  the  observation 
instrument.  After  completing  that  work,  they  were  to  take  a 
short  multiple-choice  comprehension  check.  This  practice 
test  contained  brief  vignettes  of  physical  education  classes 
which  the  trainee  was  to  read  and  then  select  the  answers 
which  appeared  to  relate  best  to  the  situation  which  was 
described.  The  answers  were  provided  for  the  comprehension 
check  so  that  the  participants  could  check  how  well  they 
were  comprehending  the  system.  The  next  day  they  were  asked 
to  do  the  same  with  another  category,  and  so  on  until  they 
covered  all  five  categories.  Trainees  were  asked  to  make 
notes  on  questions  which  arose  so  they  could  be  answered 
during  the  training  session. 

The  actual  training  session  lasted  for  5-1/2  hours. 
It  opened  with  a question-and-answer  period  on  the  materials 
sent  ahead  of  the  session  and  a discussion  of  formative  and 
summative  observations,  cautions,  and  inferences.  The 
categories  and  indicators  were  discussed  as  a videotape 
containing  examples  of  the  indicators  was  played.  This 
procedure  followed  the  recommendation  of  Martin  (1977),  who 
pointed  out  that  a good  deal  of  time  should  be  spent 
specifying,  in  simple  empirical  terms,  the  behavioral  cues 
to  which  the  observers  must  respond.  Although  the 
instrument  was  coded  from  a videotape  for  final  evaluation 
purposes,  actual  coding  of  the  instrument  was  initiated  by 
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using  an  audiotape  of  isolated  teaching  incidents  since  the 
use  of  an  audiotape  freed  the  participants  from  having  to 
look  at  both  a videotaped  lesson  and  the  instrument  at  the 
same  time.  Use  of  the  audiotape  enabled  the  participants  to 
listen  to  the  vignette  while  scanning  the  instrument,  thus 
aiding  the  participants  in  becoming  familiar  with  the  format 
of  the  instrument  and  the  location  of  the  indicators  and 
categories.  Each  vignette  was  presented,  coded,  and 
discussed . 

Following  this,  2 hours  of  observation  coding 
practice  was  conducted  in  which  participants  used  the 
instrument  for  coding  while  watching  videotapes  of  activity 
classes.  Participants  observed  a videotape  and  coded  for  a 
preset  amount  of  time.  The  length  of  time  the  observers 
marked  the  tape  was  gradually  increased.  Participants  were 
encouraged  to  mark  any  behaviors  they  observed  and  to  code 
those  behaviors  on  the  theoretical  bases  established  by  the 
research.  They  were  told  to  resist  a tendency  to  draw  on 
their  knowledge  and  experience  to  make  more  subtle 
discriminations  than  were  called  for.  Participants  were 
told  to  tally  observations  based  only  on  first  impressions 
and  not  to  infer  hidden  meanings  in  teacher  behavior. 
Observers  were  also  told  to  make  no  evaluations  as  to  how  a 
teacher  should  teach  a certain  activity,  only  to  decide 
whether  or  not  the  teacher  was  performing  one  of  the 
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indicators  on  the  instrument.  After  each  observation/ 
coding  session,  the  tape  was  stopped,  the  behaviors 
discussed  (debriefed),  and  the  tape  replayed,  when 
necessary,  to  help  familiarize  the  observers  with  the 
behaviors.  During  the  entire  training  session  observers 
were  given  immediate  feedback  on  the  correctness  of  their 
observations . 


Training  Materials 

During  the  training,  extensive  use  was  made  of 
videotapes  taken  of  physical  education  teachers  actually 
teaching  their  classes.  The  videotapes  were  edited  and 
organized  for  use  in  the  training.  Although  Frick  and 
Semmel  (1978)  stated  that  videotapes  of  unambiguous  examples 
were  sufficient  in  training  effective  observers,  Mash  and 
McElwee  (1974)  found  that  observers  trained  to  code 
unpredictable  sequences  showed  a greater  maintenance  of 
accuracy  under  new  observation  conditions  than  those  trained 
to  code  predictable  behavior  sequences.  This  suggested  that 
observers  trained  with  unpredictable  sequences  may  be  more 
flexible  in  responding  to  new  situations.  Because  of  these 
findings  a videotape,  edited  to  provide  unambiguous  examples 
for  ease  in  identifying  teacher  behavior,  was  used  to  give 
examples  of  identified  teacher  behavior.  Once  the  practice 
observation  process  began,  videotapes  of  more  complex 
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teacher  behavior  were  used  In  order  to  facilitate  observer 
flexibility.  For  this  latter  reason,  a realistic  tape, 
rather  than  an  unambiguous  one,  was  used  as  the  criterion. 

In  order  to  facilitate  the  process  of  locating 
specific  portions  of  the  initial  audiotape,  the  number  of 
each  incident  was  announced  before  the  vignette.  Tape 
recorder  counters  were  not  listed,  as  used  by  Roderick 
(1975),  due  to  the  wide  variety  of  tape  recorders  available 
and  the  lack  of  correlation  between  them  on  counter 
distance.  On  the  initial  videotape  of  unambiguous  examples, 
written  and  oral  announcements  were  made  on  the  tape  just 
before  each  vignette.  Scripts  listing  the  behavioral 
incidences  and  the  appropriate  coding  for  all  videotapes  and 
the  audiotape  was  placed  in  the  training  manual . 

At  the  end  of  the  training,  the  participant's 
knowledge  of  the  observation  instrument  was  evaluated  by  two 
methods.  A comprehension  check,  a written  objective  test, 
was  administered  to  measure  the  observer's  objective 
knowledge  of  the  categories  and  indicators  (Appendix  G)  and 
viewing  a criterion  videotape  while  using  the  instrument  for 
coding  observed  teacher  behavior  measured  the  observer's 
ability  to  apply  coding  knowledge.  The  criterion  score  for 
the  videotape  was  established  by  the  developer's  coding  of 
the  tape.  Members  of  the  experimental  groups  were  also 
asked,  by  means  of  two  surveys  using  Likert  scales 
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(Appendix  E),  to  evaluate  the  effectiveness  of  the  specific 
manuals,  tapes,  and  strategies  used  during  the  training. 

The  surveys  were  organized  around  Herbert's  (cited  by 
Herbert  & Attridge,  1975)  three  criteria:  identifying, 

validity,  and  practicality  criteria.  The  training  materials 
were  also  submitted  to  two  educators,  experienced  in 
training  persons  to  use  teacher  observation  instruments,  for 
their  evaluation  using  the  same  survey  instrument. 

One  of  the  observers  trained  in  the  initial  group 
then  trained  other  observers  using  the  training  materials 
developed  for  the  initial  session.  The  same  indicator 
check,  criterion  tape  observation,  and  surveys  were  given  at 
the  end  of  the  training.  The  results  of  these  evaluations 
provided  a measure  of  the  relative  effectiveness  of  the 
second-level  training  session. 

Design  of  the  Study 

A single  factor  design  with  three  levels  was 
selected  for  this  part  of  the  study.  It  is  similar  to  Ary, 
Jacobs,  and  Razavieh's  (1979)  Design  3,  which  is  described 
as  a two-group,  randomized  subject,  posttest-only  design. 
This  part  of  the  study  can  be  depicted  as  follows: 
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Type  of  Training 


No 

Training 


Training  by  Second-level 

Developer  Training 


The  design  chosen  is  one  of  the  simplest,  yet  one  of 
the  most  powerful  of  all  experimental  designs  (Ary,  Jacobs, 

& Razavieh,  1979).  It  required  three  randomly  assigned 
groups  of  subjects,  each  assigned  to  a different  condition. 
Two  groups  were  exposed  to  the  experimental  treatment. 
Members  of  all  three  groups  were  measured  on  the  dependent 
variables.  The  main  advantage  of  this  design  was 
randomization,  which  assured  statistical  equivalence  of  the 
groups  prior  to  the  introduction  of  the  independent 
variable.  No  pretest  was  used  since  randomization 
controlled  for  any  possible  extraneous  variables  and  assured 
that  any  initial  differences  between  the  groups  were 
attributable  only  to  chance  and  therefore  would  follow  the 
laws  of  probability. 

Internal  validity 

The  design  which  was  chosen  controlled  for  the  main 
effects  of  maturation,  pretesting,  and  history.  Since  no 
pretest  was  used,  there  was  no  interaction  effect  of 
pretesting  and  the  performance  measures.  The  briefness  of 
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the  length  of  the  training  controlled  for  maturation  effects 
such  as  fatigue  and  experimental  mortality.  Randomization 
also  controlled  for  differential  selection  of  subjects  and 
selection-maturation  interaction . 

External  validity 

Use  of  the  principle  of  randomization  made  some 
generalization  of  study  data  and  results  defensible,  and  the 
use  of  another  trainer  for  the  second  experimental  treatment 
lent  a measure  of  ecological  validity.  A comparison  of  the 
results  of  the  second  trainer  was  intended  to  provide 
assurance  that  the  experimental  effect  was  independent  of 
the  techniques  or  personality  of  the  developer.  An 
interaction  effect  due  to  what  is  commonly  called  the 
Hawthorne  effect  (Ary,  Jacobs,  & Razavieh,  1979)  when  a 
subject's  behavior  may  be  influenced  partly  by  his  or  her 
perception  of  the  experiment  was  controlled  since  all  groups 
were  subject  to  some  type  of  treatment  (no  training,  yet 
took  a coding  test;  training  by  the  developer;  training  by  a 
second-level  trainer). 

Statistical  Analysis 
Criterion  Proportions 

Criterion  proportions  were  established  for  both 
effective  and  ineffective  indicators  on  the  criterion  tape. 
The  same  calculations  were  made  for  each  trainee.  The 
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difference  between  a trainee's  proportions  and  the  criterion 
proportions  became  the  accuracy  measure  of  each  trainee. 

The  accuracy  measures  and  the  score  on  the  comprehension 
check  were  analyzed  using  a one-way  analysis  of  variance.  A 
Bonferroni  t-test  was  then  conducted  on  any  significant  F- 
ratios . 

Survey  Scoring 

Descriptive  statistics  were  computed  for  the  survey 
instrument,  a Likert-type  scale.  Means  for  each  question 
were  established  and,  when  indicated,  the  data  were  analyzed 
using  an  independent  samples  t-test.  The  object  was  to 
determine  which  trainer,  if  either,  was  evaluated  more 
positively,  and  also  to  measure  how  valuable  the  trainees 
considered  the  training  materials. 


CHAPTER  V 

ANALYSIS  AND  INTERPRETATION  OF  DATA 

The  data  obtained  by  administering  the  comprehension 
check  and  the  criterion  tape  accuracy  observation  to  all 
three  groups  and  the  two  surveys  to  the  treatment  groups  are 
presented  in  the  first  part  of  this  chapter.  Presented  in 
the  latter  part  are  interpretations  that  were  made  after 
analyzing  the  data.  The  results  are  organized  and  presented 
in  the  following  order:  (a)  the  comprehension  check, 

(b)  the  criterion  tape  accuracy  results,  and  (c)  the  survey 
results.  Alpha  was  set  at  the  .05  level  of  significance. 
When  obtained,  significance  levels  of  .01  were  indicated. 

Analysis 

Comprehension  Check 

Each  participant  in  the  study  took  a multiple-choice 
comprehension  check  (written  test).  The  mean,  range,  and 
standard  deviation  for  each  group  was  calculated  and  is 
displayed,  along  with  the  raw  scores,  in  Appendix  F, 

Table  F-l . An  analysis  of  variance  (ANOVA)  of  the  three 
groups  was  also  conducted.  Analysis  of  variance  is  a way  of 
measuring  numerically  how  different  the  means  of  the  three 
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groups  are  and  how  much  the  scores  are  spread  around  their 
respective  means.  With  these  two  measures,  it  is  possible 
to  tell  whether  the  means  differ  significantly  or  not.  The 
total  variance  in  ANOVA  is  analyzed  into  the  within-group 
variance  (a  result  of  random  people  differences,  an  error 
variance,  or  unexplainable),  and  into  between-group  variance 
(a  combination  of  the  covariate  effects  and  the  main 
treatment  effects) . The  assumption  underlying  the  analysis 
of  variance  procedure  is  that  if  the  groups  to  be  compared 
are  truly  random  samples  from  the  same  population,  then  the 
between-group  mean  square  should  not  differ  from  the  within- 
group  mean  square  by  more  than  the  amount  we  should  expect 
from  chance  alone. 

As  shown  in  Table  2,  a significant  (p  < .01)  F-ratio 
was  obtained  on  the  comprehension  check.  When  there  is  a 
significant  F-ratio  the  conclusion  is  made  that  the  observed 
difference  among  the  sample  means  is  significant  and  the 
population  means  are  therefore  different.  In  other  words, 
the  measures  obtained  from  the  groups  involved  differ  and 
the  differences  are  greater  than  one  would  expect  by  chance 
alone.  A significant  F-ratio,  however,  does  not  necessarily 
mean  that  all  groups  differ  significantly  from  all  other 
groups.  In  order  to  find  if  any  of  the  groups  differed 
significantly  from  any  of  the  other  groups,  a Bonferroni 
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Table  2 

Summary  of  the  Analysis  of  Variance  of  the 
Three  Groups:  Comprehension  Check 


Source  of 
Variance 

SS 

df 

MS 

F 

Level  of 
Significance 

Between  Groups 

114 . 8 

2 

57 . 40 

8.11 

.01 

Within  Groups 

184 . 2 

26 

7.08 

Total 

299 . 0 

28 

Table  3 

Results  of  Bonferroni  t-Test  for 
Significant  Analysis  of  Variance 
Results:  Comprehension  Check 


Groups 

Level  of 

Compared3 

Result 

Significance 

A-B 

0.70 

ns 

A-C 

0.70 

ns 

B-C 

0.00 

ns 

Note . The  term  "ns"  means  nonsignificant 
in  this  and  in  subsequent  tables. 

aGroup  A — No  training 
Group  B — Developer  trained 
Group  C — Second-level  training 
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t-test  was  conducted.  The  results  are  displayed  in  Table  3. 
The  test  affirmed  the  presence  of  a treatment  effect  between 
Group  A and  both  Group  B and  Group  C,  although  the  effect 
was  not  significant.  There  was  no  difference  between 
Group  B and  Group  C since  their  mean  scores  were  identical. 

Criterion  Accuracy  Measure 

The  criterion  accuracy  score  (on  the  observation)  of 
each  participant  was  analyzed  in  relation  to  a predetermined 
criterion.  Tallies  made  by  observer-raters  in  each 
indicator  were  grouped  and  totaled  by  category.  In 
addition,  tallies  on  either  side  of  the  instrument,  the 
effective  and  ineffective  side,  were  handled  separately. 

This  led  to  10  accuracy  scores  per  participant.  The 
individual  raw  scores  (number  of  tallies  made  by  each 
participant)  and  the  accuracy  scores  (the  difference  between 
an  individual  rater's  number  of  tallies  and  the  criterion 
score)  on  the  10  measures  will  be  found  in  Appendix  F in 
Tables  F-2  through  F-ll.  At  the  bottom  of  each  table  is  the 
criterion  score  for  that  category,  the  mean  of  the  accuracy 
scores  (showing  the  average  difference  between  the  raw  score 
and  the  criterion  score) , the  standard  deviation,  and  the 
range  of  the  raw  scores. 
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Table  4 

Summary  of  the  Analysis  of  Variance  of  the 
Three  Groups:  Efficient  Use  of  Time  (Effective) 


Source  of 
Variance 

SS 

df 

MS 

Level  of 

F Significance 

Between  Groups 

422.04 

2 

211.02 

15.95  .01 

Within  Groups 

344 . 10 

26 

13 . 23 

Total 

766. 14 

28 

Summary 
Three  Groups 

Table  5 

of  the  Analysis  of  Variance  of  the 
: Efficient  Use  of  Time  (Ineffective) 

Source  of 
Variance 

SS 

df 

MS 

Level  of 

F Significance 

Between  Groups 

16.99 

2 

8.50 

4.74  .05 

Within  Groups 

46 . 46 

26 

1.79 

Total 

63.45 

28 
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Table  6 

Summary  of  the  Analysis  of  Variance  of  the 
Three  Groups:  Lesson  Development  (Effective) 


Source  of 
Variance 

SS 

df 

MS 

F 

Level  of 
Significance 

Between  Groups 

313.99 

2 

15 . 70 

9 . 53 

.01 

Within  Groups 

428 .22 

26 

16.47 

Total 

742 .21 

28 

Table  7 

Summary  of  the  Analysis  of  Variance  of  the 
Three  Groups:  Lesson  Development  (Ineffective) 

Source  of 
Variance 

SS 

df 

MS 

F 

Level  of 
Significance 

Between  Groups 

54.01 

2 

27.01 

3 . 84 

.05 

Within  Groups 

183.02 

26 

7.04 

Total 

237.03 

28 
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Table  8 

Summary  of  the  Analysis  of  Variance  of  the 
Three  Groups:  Demonstration  (Effective) 


Source  of 
Variance 

SS 

df 

MS 

F 

Level  of 
Significance 

Between  Groups 

101.02 

2 

50.51 

5 . 62 

.01 

Within  Groups 

233 .71 

26 

8.99 

Total 

132 . 69 

28 

Summary  of  the 
Three  Groups: 

Table  9 

Analysis  of  Variance  of  the 
Demonstration  (Ineffective) 

Source  of 
Variance 

SS 

df  MS 

Level  of 

F Significance 

Between  Groups 

3.56 

2 1.78 

. 82  ns 

Within  Groups 

56 . 23 

26  2.16 

Total 

59 . 79 

28 
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Table  10 

Summary  of  the  Analysis  of  Variance  of  the 
Three  Groups:  Supervised  Performance  (Effective) 


Source  of 
Variance 

SS 

df 

MS 

F 

Level  of 
Significance 

Between  Groups 

122 . 56 

2 

61 . 28 

3 . 29 

ns 

Within  Groups 

484 . 89 

26 

18 . 65 

Total 

607.45 

28 

Summary 
Three  Groups 

Table  11 

of  the  Analysis  of  Variance  of  the 
: Supervised  Performance  (Ineffective) 

Source  of 
Variance 

SS 

df  MS 

F 

Level  of 
Significance 

Between  Groups 

0.91 

2 0.46 

.23 

ns 

Within  Groups 

52.40 

26  2.02 

Total 

53 .31 

28 
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Table  12 

Summary  of  the  Analysis  of  Variance  of  the 
Three  Groups:  Feedback  (Effective) 


Source  of 
Variance 

SS 

df 

MS 

F 

Level  of 
Significance 

Between  Groups 

12.51 

2 

6.26 

. 30 

ns 

Within  Groups 

538 . 73 

26 

20.72 

Total 

551 . 24 

28 

Summary 

Three 

of  the 
Groups 

Table  13 

Analysis  of  Variance  of  the 
: Feedback  (Ineffective) 

Source  of 
Variance 

SS 

df 

MS 

Level  of 

F Significance 

Between  Groups 

45.87 

2 

22 . 94 

8.37  .01 

Within  Groups 

71 .16 

26 

2 . 74 

Total 

117.03 

28 
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The  results  of  the  ANOVA  for  each  of  the  10 
categories  are  presented  In  this  chapter  in  Tables  4-13. 

Six  out  of  the  10  categories  resulted  in  a main  effects 
F-ratio  significant  at  an  alpha  level  of  .05  or  .01.  A 
Bonferroni  t-test  for  significance  was  conducted  with  each 
of  those  six  categories.  Significant  differences  (p  < .05) 
were  obtained  in  five  of  the  categories.  The  results  of  the 
t-test  are  shown  in  Tables  14-19.  As  can  be  seen  from 
studying  the  tables,  significant  differences  were  obtained 
between  Group  A (the  control  group)  and  Group  C on  five 
measures  and  between  Group  A and  Group  B on  four  of  the 
measures.  There  were  no  significant  differences  between 
Groups  B and  C (the  two  treatment  groups)  on  any  of  the  six 
measures . 

Surveys 

Two  surveys  used  to  ask  the  trainees  (those  who  went 
through  either  of  the  training  sessions)  to  evaluate  (a)  the 
training  materials  and  (b)  the  trainer's  effectiveness  were 
analyzed.  The  descriptive  data  of  the  surveys  which 
includes  the  range  of  scores  and  the  mean  for  Groups  B and  C 
on  each  question,  can  be  found  in  Appendix  F,  Tables  F-12 
through  F-15.  When  the  difference  between  the  means  of  the 
two  groups  was  greater  than  .5,  a t-test  for  independent 
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Table  14 

Results  of  Bonferroni  t-Test  for 
Significant  Analysis  of  Variance  Results: 
Efficient  Use  of  Time  (Effective) 


Groups 

Level  of 

Compared 

Result 

Significance 

A-B 

4 . 10 

. 05 

A-C 

5 . 47 

.05 

B-C 

0.28 

ns 

Results 

Significant 

Efficient 

Table  15 

of  Bonferroni  t-Test  for 
Analysis  of  Variance  Results: 
Use  of  Time  (Ineffective) 

Groups 

Compared 

Result 

Level  of 
Significance 

A-B 

3.05 

.05 

A-C 

3 . 75 

.05 

B-C 

1 . 84 

ns 
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Table  16 

Results  of  Bonferroni  t-Test  for 
Significant  Analysis  of  Variance  Results: 
Lesson  Development  (Effective) 


Groups 

Compared 

Result 

Level  of 
Significance 

A-B 

4.06 

.05 

A-C 

3.52 

.05 

B-C 

0.54 

ns 

Table  17 

Results  of  Bonferroni  t-Test  for 
Significant  Analysis  of  Variance  Results: 
Lesson  Development  (Ineffective) 


Groups  Level  of 


Compared 

Result 

Significance 

A-B 

2.34 

ns 

A-C 

2.50 

ns 

B-C 

0.17 

ns 
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Table  18 

Results  of  Bonferroni  t-Test  for 
Significant  Analysis  of  Variance  Results: 
Demonstrations  (Effective) 


Groups  Level  of 


Compared 

Result 

Significance 

A-B 

1 . 61 

. ns 

A-C 

3 . 36 

.05 

B-C 

1 . 80 

. ns 

Table  19 

Results  of  Bonferroni  t-Test  for 
Significant  Analysis  of  Variance  Results: 
Feedback  (Ineffective) 


Groups  Level  of 


Compared 

Result 

Significance 

A-B 

2.90 

.05 

A-C 

3.95 

.05 

B-C 

1.08 

ns 
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samples  was  conducted  to  establish  whether  or  not  the 
difference  between  the  means  was  significant.  The  results 
of  t-tests  are  also  found  in  Tables  F-12  through  F-15.  On 
the  two  surveys,  the  mean  for  each  of  the  two  groups  only 
differed  by  .5  or  more  on  15  questions  out  of  59.  After 
conducting  a t-test  for  these  responses,  only  one  response, 
for  question  number  29,  showed  a significant  (p  < .05) 
difference  between  the  two  groups. 

Interpretation  of  Data 
Comprehension  Check 

The  mean  score  for  the  participants  in  two  training 
sessions  was  the  same,  although  the  range  and  standard 
deviation  scores  were  different.  The  statistically 
significant  (p  < .01)  F-ratio  when  the  three  groups  were 
compared  indicated  the  presence  of  a treatment  effect. 

These  data  lend  a measure  of  support  to  the  effectiveness  of 
the  training  sessions.  The  nonsignificant  difference 
between  the  two  training  groups  and  each  of  these  groups  and 
the  control  group  was  most  likely  due  to  several  factors. 
First,  a wide  variation  among  the  participants  in  the  number 
correct  did  not  result  in  clearly  different  means  mainly  due 
to  the  small  group  sizes  and  the  relatively  short  test 
(32  questions).  An  additional  reason  for  a nonsignificant 
difference  could  be  due  to  the  well-clarified  wording  for 
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each  of  the  Indicators  (used  as  answers  on  the  comprehension 
check) . The  wording  of  these  indicators  had  been  chosen 
with  great  care  to  assure  low-inference  choices  during  the 
observation  and  tallying  process.  On  the  comprehension 
check,  it  was  fairly  easy  for  the  untrained  as  well  as  the 
trained  subjects  to  select  the  correct  answer,  in  many 
cases,  because  of  this  clarity  and  because  of  the 
specificity  inherent  in  each  of  the  questions  depicting  a 
problem  situation. 

Criterion  Accuracy  Measure 

The  ANOVA  on  each  of  the  10  criterion  accuracy 
categories  resulted  in  significant  F-ratios  on  6 of  them. 

The  significant  differences,  in  6 of  the  10  categories 
indicated  a definite  treatment  effect.  Four  of  these 
categories — Effective  Use  of  Time  (effective).  Lesson 
Development  (effective).  Demonstrations  (effective),  and 
Feedback  (ineffective) — had  an  F-ratio  significant  at  the 
.01  level.  The  first  three  categories  listed  were  those  in 
which  a large  number  of  tallies  were  made  (criterion  scores 
of  19,  20,  20).  The  more  tallies  that  can  be  made  in  a 
category,  the  greater  the  chance  of  obtaining  differences 
that  are  significant.  Supervised  Performance  (effective), 
however,  which  had  a relatively  high  number  of  expected 
tallies  (criterion  score  = 17),  did  not  show  this  same 
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treatment  effect.  This  lack  of  a significant  difference  may 
be  attributed  to  the  participant's  familiarity  with  using 
practice  as  a teaching  technique,  thus  leading  to  a large 
number  of  tallies  by  all  participants.  The  significant 
differences  on  Feedback  (ineffective),  a low-tally  indicator 
(criterion  = 7),  can  most  likely  be  attributed  to  lack  of 
rater  familiarity  with  these  indicators  until  one  had  gone 
through  the  training.  Many  of  the  control  group 
participants  commented  that  they  were  sure  some  type  of 
feedback  was  being  given,  but  were  not  always  sure  what  was 
meant  by  the  wording  used  in  the  bottom  six  indicators  in 
that  category.  The  differences  in  scores  were,  therefore, 
directly  related  to  the  training  sessions. 

Two  of  the  six  categories  had  significant  F-ratios 
at  the  .05  level:  Efficient  Use  of  Time  (ineffective) 

(criterion  = 2)  and  Lesson  Development  (ineffective) 
(criterion  =9).  The  scores  for  both  treatment  groups  on 
each  of  these  indicators  were  higher  than  the  control  group, 
thus  indicating  that  the  training  made  participants  more 
aware  of  ineffective  behaviors  in  these  two  categories. 

A Bonferroni  t-test  for  significance  between  groups 
was  conducted  for  each  of  the  six  categories  that  had 
significant  F-ratios.  Since  the  only  significant  results  of 
the  Bonferroni  t-test  were  obtained  between  Group  A (control 
group)  and  either  Group  B or  Group  C (treatment  groups),  it 
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is  clear  that,  since  the  results  varied  more  than  we  might 
expect  from  chance,  the  treatment  resulted  in  a positive 
difference  in  group  performance. 

Surveys 

On  the  surveys,  the  only  significant  difference  was 
on  question  number  29  on  the  first  survey  covering  training 
materials.  Respondents  were  asked  whether  or  not  the 
subject  of  costs  likely  to  be  incurred  in  the  use  of  the 
instrument  was  covered.  Group  B agreed  that  the  topic  of 
costs  had  been  covered;  Group  C said  it  had  not  been 
covered.  Since  the  item  on  the  costs  of  implementing  the 
system  is  covered  only  in  the  trainer's  manual,  it  seemed 
apparent  that  the  second-level  trainer  omitted  all  or  part 
of  that  section. 

On  the  other  questions  on  the  first  survey,  all 
respondents  agreed  or  strongly  agreed  with  the  stated 
criteria.  On  the  second  survey,  which  asked  about  trainer 
effectiveness,  all  respondents  agreed  or  strongly  agreed 
with  the  criteria  stated  in  each  question  with  the  exception 
of  one  question  concerning  whether  or  not  the  problem  of 
context  had  been  explained.  Respondents  were  undecided 
about  this  topic.  Again,  this  information  was  only  in  the 
training  manual,  a fact  that  leads  to  the  belief  that  the 
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second-level  trainer  did  not  cover  the  problems  of  context 
in  the  observation  process,  or  covered  them  very  briefly. 


CHAPTER  VI 

SUMMARY,  CONCLUSIONS,  IMPLICATIONS,  RECOMMENDATIONS 

Summary 

Background  and  Overview  of  Study 

The  need  for  a teacher  observation  formative 
instrument  to  use  with  beginning  physical  education  teachers 
in  activity  settings,  as  part  of  the  Florida  Beginning 
Teachers  Program,  led  to  this  study.  Although  the 
appearance  of  general  applicability  was  indicated  by  a study 
conducted  by  the  developers  of  the  Florida  Performance 
Measurement  System  (FPMS),  observers  noted  limitations  when 
using  the  instrument  with  physical  education  teachers  during 
physical  activity  instruction  in  such  areas  as  game  or  motor 
skills. 

A review  of  the  literature  revealed  a number  of 
studies  that  appeared  to  have  direct  application  to  the 
development  of  a viable  observation  instrument  for  physical 
education.  An  instrument  with  five  major  categories  of 
teacher  behavior,  comprising  64  indicators  was  developed. 

The  categories  covered  behaviors  related  to  the  Efficient 
Use  of  Time,  Lesson  Development,  Demonstrations,  Supervised 
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Performance,  and  Feedback.  Care  was  taken  to  select  low- 
inference  indicators  and  to  define  and  delimit  each  of  the 
behavioral  indicators. 

Materials  developed  to  train  observers  in  using  the 
instrument  further  described  each  of  the  indicators. 

Trainees  received  a summary  of  the  research  upon  which  the 
instrument  was  based.  Each  trainee  was  also  given  a Coding 
Manual  which  had  been  developed  to  specify  and  describe  the 
teacher  behaviors  each  of  the  indicators  encompassed.  The 
Coding  Manual  also  included  directions  on  how  to  code 
behavior  on  the  instrument  and  cautions  regarding  inferences 
that  may  or  may  not  be  made.  Materials  developed  for  use  in 
the  training  sessions  included  an  introductory  videotape 
which  contained  specific  visual  examples  of  the  indicators, 
an  audiotape  of  classroom  vignettes,  videotapes  of  actual 
physical  education  activity  classes,  a training  manual,  and 
a practice  test.  Evaluation  materials  included  a written 
comprehension  check  (test)  and  criterion  videotape. 

Research  Problems  and  Study  Design 

Research  problems.  Three  problems  had  to  be 


resolved  as  follows: 
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Problem  A:  To  identify  those  effective  and 

ineffective  teaching  behaviors  that  should 
or  should  not  be  displayed  by  physical 
education  teachers  in  an  activity  setting 
Problem  B:  To  develop  a teacher  observation 

instrument  based  on  the  research  that  was 
identified 

Problem  C:  To  develop  a package  for  training 

observers  to  use  the  instrument  and  to  train 
a group  of  observers  to  test  the  usability 
of  the  instrument 

Study  design.  The  first  two  problems  were  addressed 
in  Chapters  II  and  IV  by  a thorough  review  and  synthesis  of 
research  on  the  teaching  of  physical  education  motor  skills 
and  on  instrument  design  and  development.  In  Chapter  III 
the  training  design  was  addressed.  The  evaluation  of  the 
training  and  the  instrument  made  use  of  a single  factor, 
posttest  only,  design  with  three  levels.  Subjects  were 
randomly  assigned  to  one  of  three  groups.  Two  groups 
participated  in  training  sessions,  and  the  third  group  was 
the  control  group.  One  training  session  (n  = 10)  was 
conducted  by  the  developer  of  the  instrument ; another  one 
(n  = 10)  was  conducted  by  a trainee  from  the  initial  session 
using  the  procedures  detailed  in  a training  manual  that  had 
been  developed.  The  purpose  of  this  second  session  was  to 
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see  if  the  training  was  as  effective  when  conducted  by 
someone  other  than  the  developer.  Both  groups  received  the 
same  multiple-choice  examination  and  observed  the  same 
criterion  videotape  at  the  end  of  the  session.  A third 
group  (n  = 9),  the  control  group,  took  the  two  final 
evaluations  without  benefit  of  a training  session.  The  two 
groups  which  went  through  a training  session  were  asked  to 
complete  two  surveys  to  evaluate  the  training  materials  and 
the  trainer. 

Findings  Based  on  Analysis  of  Data 

1.  Results  of  an  ANOVA  on  the  comprehension  check 
revealed  a significant  difference  among  the  three  groups 
with  the  training  groups  having  the  higher  mean  scores. 
However,  in  the  follow-up  with  the  Bonferroni  t-test  the 
differences  were  not  statistically  significant  at  the 
predetermined  alpha  level  of  .05. 

2.  Statistically  significant  differences  among  the 
groups  were  found  for  6 of  the  10  accuracy  measures  in  which 
subjects  coded  videotaped  lessons  and  were  rated  against  a 
criterion.  Further  analysis  revealed  a statistically 
significant  difference  between  the  control  group  and  one  or 
both  of  the  training  groups  for  5 of  the  6 measures : 
Efficient  Use  of  Time  (effective  and  ineffective).  Lesson 
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Development  (effective).  Demonstrations  (effective),  and 
Feedback  (ineffective). 

3.  There  were  no  significant  differences  on  the 
accuracy  measures  between  Group  C (second-level  training) 
and  Group  B ( developer-training ) . 

4.  Trainees  responded  to  the  survey  instrument  to 
the  effect  that  the  training  materials  and  training  method 
met  the  established  criteria. 

Study  Conclusions  and  Implications 

1.  According  to  the  data  analyses,  the  5-1/2  hour 
training  session,  combined  with  the  homework,  was  sufficient 
to  train  potential  observers  in  the  use  of  this  instrument. 

A subjective  evaluation  by  the  trainers,  however,  resulted 
in  a recommendation  that  the  on-site  training  be  increased 
at  least  30  minutes  to  provide  for  more  observation 
practice . 

2.  Since  trainers  may  vary  in  what  they  choose  to 
cover  in  their  oral  presentation,  observers  should  be  given 
in  writing  all  the  information  needed  to  use  the  instrument 
effectively.  The  effective  replication  of  the  training 
package  is  based,  in  part,  on  the  completeness  of  the 
training  materials. 

3.  Use  of  this  instrument  in  studies  of  teacher 
effectiveness  may  be  possible  due  to  the  fact  that  observer 
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effectiveness  was  determined  by  performance  on  a criterion- 
referenced  videotape.  Frick  and  Semmel  (1978)  reported  that 
when  drawing  conclusions  from  observational  studies,  one  of 
the  problems  in  relating  teacher  behavior  to  pupil  growth  is 
the  difficulty  of  generalizing  findings  in  situations  when 
information  about  the  behavioral  variables  is  inadequate. 
"Criterion-referenced  measures  of  observer  agreement  within 
the  context  of  an  observation  system  instructional  package 
could  help  reduce  this  problem"  (p.  161). 

4.  The  indicators  selected  for  inclusion  on  the 
observation  instrument  appear  to  be  associated  with  student 
achievement  in  activity/motor  skills  settings.  A recent 
article  by  Graham  (1987)  confirmed  the  inclusion  of  many  of 
the  indicators.  Graham,  based  upon  his  research,  listed 
five  reasons  for  low  motor  skill  acquisition:  (a)  loss  of 

instructional  time  due  to  teacher  talk,  management,  and 
student  waiting;  (b)  too  little  practice  time  and  lack  of 
feedback  on  qualitative  aspects  of  motor  skills; 

(c)  inadequate  amounts  of  specific  feedback  on  skill 
performance;  (d)  too  much  time  spent  on  playing  games  rather 
than  actually  practicing  motor  skills;  and  (e)  a lack  of 
specificity  and  variability  in  skills  instruction  and 
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Issues  Unresolved 

There  are  two  Important  ways  to  improve  the 
effectiveness  of  teachers.  One  is  by  changing  the  way  they 
are  evaluated  and  the  other  is  by  changing  the  way  they  are 
educated  (Medley,  1979).  The  aspect  of  teacher  education  is 
beyond  the  scope  of  this  study.  A change  in  the  process  of 
evaluating  teachers  is  implicit.  Teacher  evaluators, 
however,  must  be  cognizant  of  several  cautions  implied  by 
the  results  of  this  study.  They  are  as  follows: 

1 . The  instrument  has  been  used  with  elementary 
and  secondary  students  in  activity  classes 
in  gymnastics,  badminton,  basketball,  and 
manipulative  skills.  Although  the 
instrument  seems  to  have  generalizability  to 
other  activity  classes,  this  may  not  be  the 
case . 

2.  This  study  was  directed  to  the  question  of 
how  to  identify  effective  teaching  without 
addressing  the  important  question  of  how  to 
bring  about  changes  in  teaching  behavior. 

Recommendations 

On  the  basis  of  the  results  of  this  study,  the 
following  recommendations  are  made  concerning  the 
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development  and  use  of  this,  and  other,  teacher  observation 
instruments . 

1.  It  is  critical  that  more  pedagogical 
research,  especially  in  physical  education, 
be  undertaken.  Studies  in  this  area  should 
be  a high  priority  for  physical  educational 
researchers . 

2.  Although  the  data  have  indicated  that  this 
instrument  has  a degree  of  usability,  it  is 
important  to  conduct  studies  to  establish 
its  reliability  since  reliable  measurements 
are  ultimately  essential  to  generalizations 
about  relationships  among  educational 
processes  and  outcomes  (Frick  & Semmel, 

1978) . 

3.  There  is  a possibility  that  the  indicators 
developed  for  this  instrument  may  lend 
themselves,  with  appropriate 
reinterpretation  and  definition,  to  other 
subject  areas  which  rely  heavily  on  the 
learning  of  motor  skills.  Studies  should  be 
conducted  in  subject  areas  such  as 
industrial  arts,  business,  home  economics, 
art,  music,  and  certain  exceptional  student 


education  areas. 


189 


4.  Studies  should  be  conducted  that  would 
correlate  ratings  from  this  instrument  with 
the  amount  of  student  on-task  behavior  and 
other  outcome  variables. 

5.  Further  analyses  of  participants'  tallies  on 
the  indicators  should  be  conducted  and  the 
data  used  to  clarify  definitions  further  and 
revise  training  procedures.  Audiotapes  or 
videotapes  used  for  training  should  contain 
as  many  examples  of  teacher  behavior  as  is 
practical.  All  coding  directions  and 
cautions  should  be  contained  in  the  written 
materials  received  by  each  trainee. 

6.  The  problem  of  whether  there  were 
differences  between  the  ratings  of  observers 
from  the  field  of  physical  education  and 
those  with  no  specific  physical  education 
training  was  not  dealt  with  during  this 
study.  Also  not  dealt  with  was  the  concern 
as  to  whether  or  not  participants  with  prior 
FPMS  training  or  certification  performed  any 
better  when  compared  with  the  criterion. 
Knowledge  of  the  possible  effects  of  either 
or  both  of  these  factors  would  have 
implications  regarding  training.  Further 
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statistical  analysis  should  be  done  using 
the  data  collected  during  this  study  to  see 
if  there  were  any  significant  differences. 


APPENDIX  A 

FLORIDA  PERFORMANCE  MEASUREMENT  SYSTEM 
INSTRUMENT  3.0 


FOHMATTVE  CLASSROOM  OBSERVATION  INSTRUMENT 
3.0  INSTRUCTIONAL  ORGANIZATION  AND  DEVELOPMENT 

Florida  Performance  Measurement  System 
Coalition  for  Cba  Development  of  a Performance  Evaluation  System 
Offlca  of  Taachar  Education,  Certification  and  Inaervica  Staff  Development 

Tallahaaaea,  Florida 


Thlj  lnatruaant  la  designed  to  record  affective  indicators  of  teacher  behavior  In  the  domain  of 
Instructional  Organization  and  Development.  The  instrument  is  divided  into  five  categories:  Use 

of  Tima,  Raviev/Sisanary;  Lesson  Development;  Taachar  Treatment  of  Student  Talk/Feedback;  Homework 


Directions: 

1.  Place  a mark  in  the  appropriate  bo*  when  a relevant  behavior  is  observed.  (Effective  items 
are  on  the  left  of  the  instrument  and  ineffective  items  are  on  the  right). 

2.  Nark  an  item  each  time  it  is  observed. 

3.  Sum  frequencies  by  Indicators  snd  record  subtotals.  Sum  subtotals  for  each  category 
and  record  an  effective  snd  ineffective  total  for  the  categories  in  the  appropriate 
spaces  provided  below. 


DATA  SDMiART 


CATEGOHT 

EFFECTIVE 

INEFFECTIVE 

3.1 

Dae  of  Tima 

3.2 

Rsviev/Sunmary 

3.3 

Lesson  Development 

3. 4-3. 3 

Teacher  Treatment  of  Student 
Talk/Feedback 

3.6 

Homework/Sea two rk 

TOTAL 

8600)  NTS 
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APPENDIX  B 

INSTRUCTIONAL  ORGANIZATION  AND 
DEVELOPMENT — PHYSICAL  EDUCATION 
INSTRUMENT 


INSTRUCTIONAL  ORGANIZATION  AND  DEVELOPMENT 


CATEGORY 

EFFECTIVE  INDICATOR 

FREQUENCY 

FBEQUENCY 

INEFFECTIVE  INDICATOR 

IV 

a 

BEGINS  LESSON  PROMPTLY/ 

efficiently 

OELAYS  STARTING  THE  LESSON/ 
HIGH  MANAGEMENT 

P 

IV 

O 

PR0V10ES/ ACTIVITIES 
GIVES  DIRECTIONS 

OEUYS/STUDENISWAITFO* 
DIRECTIONS/ ASSISTANCE/ 
TEACHER  PROLONGS  TALK 

IV 

m 

3 

MATERIALS  IN  OROER/ 
MINIMAL  FOR  OISTRIB 

TIME 

UTtON 

materials  not  immediately 
available/disorganized 

H 

z 

CORRECTS/  ASSISTS 

OBSERVES.  NO 

interaction 

o 

Z 

MONITORS 

ACTIVITIES 

GENERAL  COMMENTS 

OBSERVES.  INAOEQ. 
INTERACTION 

MONITORING 

OF 

IV 

OTHER 

ENGAGES  IN 
OTHER  ACTIVITIES 

ACTIVITIES 

K 

z 

IV 

ORIENTS  STUOENTS  TO  LESSON 
(GOALS/OSJECTIVESI 

a 

a. 

CONDUCTS  REVIEW 

Ml 

> 

TALKS/FERFORMS 
ON  SUBJECT 

TALKS/PERFORMS 
OFF  SUBJECT 

Ml 

a 

MULTIPLE  QUESTIONS 

o 

« 

STUOENT 

COMPREHENSION 

UNISON  RESPONSE 

Ml 

mi 

PAUSES  AFTER 
COMPLEX  QUESTION 

SOLICITS  IMMEDIATE 
RESPONSE 

DEMONSTRATION  AT  BEGINNING 
OF  SKILL  INSTRUCTION 

NO  OEMONSTRATION/OEMONSTRATION 
AFTER  SKILL  IS  ATTEMPTED 

m 

z 

DEMONSTRATION  IS  CORRECT 

DEMONSTRATION  IS  INCONNECT 

< 

ENVIRONMENT/INPUT 
CUES  ARE  CONTROLLED 

NO  ATTEMPT  TO  CONTROL 
CUES/ENVIRONMENT 

a e 
k 

• 

VERBAL  CUES  ARE  USED 

O 

a 

DEMONSTRATION  1$ 
SEQUENCED 

OUT  OF  SEOUENCE/ 
NO  SEQUENCE 

Ml 

O 

DEMONSTRATION- IS  AT 
CORRECT  LEVEL 

SAFETY  POINTS 

ARE  COVERED 

PRACTICE  GOALS 
STATED 

NO  PRACTICE 

u 

NO  GOALS  STATED 

< 

a 

K 

WARM-UPS 
RELATED 
TO  LESSON 

NO  WARM-UPS/GENERAL 

IV 

c 

PRACTICE  IS  AT 
HIGH  SUCCESS  LEVEL 

PRACTICE  IS  AT 
LOW  SUCCESS  LEVEL 

a. 

Q 

COMPONENTS/BRIEF  VERBAL/ 
PRACTICE-PLAY-PRACTICE 

COMPLEX  SKILLS  ARE 
NOT  BROKEN  OOWN 

Ml 

<0 

> 

VARYING  CONDITIONS/ 
TASK  SPECIFIC/OISTRI! 

UTED 

STUOENTS  LISTER/ 
WAIT  TO  PARTICIPATE 

C 

Ml 

CORRECTS/  ASSISTS 

OBSERVES.  NO 
INTERACTION 

a. 

QUAUTY 

OFFICIATES 

3 

OBSERVES.  INAOEO. 
INTERACTION 

DOES  NOT 

MONITORED 

RECORDS 

MONITOR 

OTHER 

ENGAGES  IN 
OTHER  ACTIVITIES 

FEEOBACK  IS 
6IVEN 

NO  FEEDBACK 

STUOENT  IGNOREO/HARSNESS 

AT  STUOENTS  LEVEL 
OF  UNOERSTANOING 

NOT  AT  STUDENT'S 
LEVEL  OF  UNOERSTANOING 

x 

O 

< 

TIMELY 

OELAYED 

m 

a 

Ml 

AccuiuTE/swcific/counrrE 

WRONG  OR  IMPRECISE 

IV 

SUCCESS-ORIENTED 

failure-orienteo/crittcal 

SPECIFIC  PRAISE 

GENERAL  PRAISE 

HIGH-EXPECTATION  LEVEL  I 

IS  COMMUNICATED 

LOW  OR  NO  EXPECTATION 
LEVEL  tS  COMMUNICATED 

APPENDIX  C 

SAMPLE  PAGES  FROM  CODING  MANUAL 


MANUAL  FOR  CODING  TEACHER  PERFORMANCE 


IN  PHYSICAL  EDUCATION  ACTIVITY  CLASSES 
ON  AN  INSTRUCTIONAL  ORGANIZATION  AND 
DEVELOPMENT — PHYSICAL  EDUCATION  FORMATIVE  INSTRUMENT 
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INTRODUCTION 


Teacher  observation  and  evaluation  are  being  called 
for  more  and  more  by  the  press,  community,  and  educational 
administrators.  Decisions  regarding  a teacher's  job  status 
or  tenure  are  often  being  made  on  the  basis  of  data 
collected  through  infrequent  visits  to  a classroom  on  an 
instrument  that  may  report  actual  teacher  behavior  in  vague 
and  highly  judgmental  terms.  Several  problems  are  inherent 
in  this  type  of  evaluation:  infrequency  of  visits  may  lead 

to  making  improper  generalizations  regarding  a teacher's 
teaching  behavior,  instruments  may  describe  behaviors  that 
represent  the  developer's  opinion  rather  than  those 
behaviors  that  research  has  proven  to  be  effective  in 
raising  student  achievement,  and,  most  important  of  all,  the 
evaluations  may  be  used  only  to  make  final  judgments 
regarding  a teacher  rather  than  assisting  them  to  grow 
professionally. 

The  need  to  assist  our  teachers  to  become  more 
effective  clearly  indicates  two  types  of  evaluations: 
formative  and  summative.  A summative  evaluation  provides 
the  data  to  make  decisions  regarding  retention,  promotion, 
or  dismissal  of  a teacher.  A second  kind  of  observation, 
formative , supplies  a tool  for  improving  teacher 
performance.  It  is  conducted  for  the  purpose  of  determining 
which  specific  skills  need  to  be  improved  by  a teacher. 

Effective  educational  administrators  and  supervisors 
need  to  be  able  to  conduct  both  types  of  observations.  The 
formative  process  enables  an  educator  to  work  in  a 
supportive  role  with  teachers.  They  need  to  have  a thorough 
knowledge  of  the  skills  needed  by  effective  teachers,  and  be 
able  to  use  an  observation  instrument  that  has  been 
developed  for  the  appropriate  teaching  situation. 

In  the  case  of  this  observation  instrument,  it  has 
been  developed  for  use  in  physical  education  activity 
classes.  It  consists  of  items  in  the  following  categories: 

Efficient  Use  of  Time 
Lesson  Development 
Demonstrations 
Supervised  Performance 
Feedback 


GENERAL  DIRECTIONS 


To  be  able  to  observe  teacher  performance  requires 
extensive  training  as  well  as  knowledge  of  the  items  and 
their  supporting  research.  The  ability  to  observe 
objectively  does  not  come  easily  in  any  profession. 

Observing  teacher  behavior  in  an  objective  manner  is 
complicated  by  problems  of  interpretation  of  teacher 
behaviors  and  of  instrument  items,  the  speed  of  teacher- 
student  interactions,  and  possible  biases  of  the  observer 
accumulated  often  unwittingly  from  pedagogical  experiences 
and  ideologies.  It  is  crucial,  for  observing  objectively, 
that  the  observer  remember  that  the  specific  action  that  is 
occurring  in  the  class  is  what  is  to  be  recorded  and  that 
he/she  is  just  collecting  data  and  should  not  be  trying  to 
judge  whether  the  teacher's  performance  is  good  or  bad, 
effective  or  ineffective. 

This  instrument  is  an  observation  tool  for  coding 
performance  as  it  occurs  in  an  activity  class.  It  is  to  be 
done  in  a systematic  manner  and  is  to  be  nonevaluative . The 
observer  should  be  trained  to  recognize  the  teacher 
behaviors  listed  on  the  instrument  and  to  record  them  in  the 
appropriate  spaces  without  evaluating  or  interpreting  those 
behaviors.  The  only  judgment  required  of  the  observer  is  at 
the  level  of  whether  a particular  teacher  behavior  fits  an 
item  on  the  instrument.  The  events  are  to  be  recorded,  not 
analyzed . 

The  observer  must  know  the  terminology  of  the 
observation  instrument  and  be  trained  to  recognize  items  in 
behavioral  terms.  Accuracy  of  observation  is  developed 
through  training  which  includes  learning  the  location  of 
items  and  practice  in  coding.  The  instrument  is  formatted 
in  the  order  in  which  teacher  behaviors  will  most  commonly 
be  observed  during  a lesson.  Either  of  the  items  "Begins 
Lesson  Promptly/Efficiently  or  Delays  Starting  the 
Lesson/High  Management  will  be  observed  and  tallied  at  the 
start  of  the  lesson.  Some  of  the  other  items  in  the  first 
two  categories.  Efficient  Use  of  Time  and  Lesson 
Development,  may  also  be  observed  at  the  first  of  the  lesson 
(e.g..  Materials  and  Equipment,  Orientation,  Review).  Other 
items  in  these  two  categories  may  be  observed  throughout  the 
lesson  (e.g.,  Providing  Activities,  Talking/Performing, 
Questioning).  Items  in  the  category  of  Demonstrations  will 
be  tallied  only  during  demonstrations.  After  a 
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demonstration,  students  should  be  given  practice  on  the 
skill  that  was  demonstrated.  The  category  of  Supervised 
Performance  is  listed  next  on  the  instrument  since  any 
practice  should  generally  come  after  a demonstration.  The 
Feedback  category  is  last  on  the  instrument.  Feedback 
statements  may  occur  at  any  time  during  the  lesson,  but  will 
be  heard  most  commonly  during  the  practice  session  that 
follows  a demonstration.  Feedback  statements  relate  to 
those  skills  that  are  involved  in  correct  performance  of  the 
game  or  activity  that  is  being  taught,  and  not  to  behavior. 
In  some  situations  the  feedback  statements  will  relate 
directly  to  the  goals  or  skills  established  for  the  lesson; 
in  other  situations,  feedback  statements  will  be  given  for 
other  skills  involved  in  the  game  or  activity  that  is  being 
taught.  In  the  first  case,  the  feedback  will  be  coded  under 
the  Feedback  category;  in  the  latter  case,  it  will  be  coded 
under  the  Efficient  Use  of  Time  category  (Monitoring) . 

While  the  words  used  on  the  instrument  were  chosen 
with  great  care  to  describe  the  items  as  closely  as 
possible,  there  may  be  other  behaviors  included  in  a 
specific  item  that  are  not  indicated  by  the  descriptive 
words  on  the  instrument.  Effective  observers  will  become 
fully  aware  of  all  behaviors  that  are  covered  and  will  not 
depend  upon  the  description  alone.  It  is  also  important  for 
observers  to  remember  that  the  teacher  will  perform  many 
behaviors  during  a lesson  and  not  all  of  those  behaviors  are 
included  on  this  instrument.  Conversely,  the  teacher  will 
not  necessarily  demonstrate  all  the  behaviors  found  on  the 
instrument,  whether  listed  on  the  right  or  the  left.  An 
observer  should  tally  only  behavior  that  is  observed.  If  a 
behavior  is  not  seen,  do  not  assume  it  (occurred,  nor  assume 
that  it  should  have  occurred. 

Code  any  behaviors  you  observe  on  the  theoretical 
bases  established  by  the  research.  Resist  a tendency  to 
draw  on  your  knowledge  and  experience  in  order  to  make  more 
subtle  discriminations  than  are  called  for.  It  is  important 
for  an  observer  to  tally  on  first  impressions  and  not  to 
infer  hidden  meanings  in  observed  teacher  behavior.  Make  no 
inferences  as  to  how  a teacher  should  teach  a certain 
activity,  only  whether  or  not  the  teacher  is  performing  one 
of  the  items  on  the  instrument. 

At  the  conclusion  of  any  observation,  a post 
conference  should  be  conducted.  A suggested  post  conference 
form  is  included  at  the  back  of  this  manual.  This 
conference  should  be  a cooperative  one  in  which  both  the 
observer  and  the  teacher  being  observed  discuss  the 
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behaviors  that  were  observed  and  arrive  at  suggestions  for 
improving  areas  noted  as  needing  improving. 


SUMMARY:  CODING  RULES 

1 . Code  by  making  a vertical  mark  next  to  the 
appropriate  item.  Cross-hatch  the  fifth  mark  for 
ease  of  summing  coded  observations. 

2.  Code  only  teacher  behavior  that  is  actually 
observed . 

a.  Do  not  assume  a behavior  would  have 
occurred . 

b.  Do  not  code  on  any  generalized  impressions. 

3.  Code  only  at  the  time  a behavior  is  observed.  Do 
not  add  tally  marks  after  the  lesson  is  concluded. 

4.  Code  for  the  entire  lesson. 

5.  At  the  end  of  the  observation,  total  the  number  of 
tallies  for  each  item  and  write  the  number  in  the 
frequency  column  and  on  the  post-conferencing  form. 

6.  Do  not  code  during  any  time  used  for  testing.  This 
instrument  is  designed  for  instructional  situations. 

The  items  in  the  category  labeled  Supervised 
Performance  were  arrived  at  from  the  literature  on  practice. 
This  label  was  used,  instead  of  the  more  restrictive  one 
"Practice,"  in  order  to  indicate  the  inclusion  of  games  and 
other  activities. 


CODING  DIRECTIONS 

Each  time  a teacher  behavior  that  has  an  item  on  the 
instrument  is  observed,  a straight-up-and-down  mark  (I) 
should  be  made  in  the  frequency  column  immediately  next  to 
the  appropriate  item.  When  a fifth  occurrence  of  that 
behavior  is  observed,  a diagonal  sideways  crosshatch  should 
be  made  (/)  running  through  the  other  four.  This  enables 
the  behaviors  to  be  more  quickly  totaled.  When  teacher 
behaviors  are  observed  that  are  not  part  of  the  instrument, 
but  the  observer  considers  them  to  be  behaviors  that  need  to 
be  discussed  during  the  post  conference.  The  observer  may 
feel  free  to  make  comments  and  notations  around  the  edge  of 
the  instrument.  Care  must  be  taken,  however,  not  to  make 
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these  remarks  lengthy  so  as  to  miss  other  teacher  behaviors 
that  are  included  on  the  instrument. 

Each  teacher  behavior  should  be  recorded  once,  at 
the  time  it  occurs.  There  are  some  extensions/exceptions  to 
this  rule. 

1 . A behavior  may  be  recorded  in  more  than  one 
space.  For  example,  a teacher  may  be 
monitoring  the  movement  quality  of  students 
during  practice  and  also  asking  questions. 

In  this  case  items  Movement  Quality  Is 
Monitored  and  Questions  Student 
Comprehension  should  be  tallied. 

2.  When  a teacher  continues  a behavior  over  a 
period  of  time,  it  should  be  tallied  each 
time  the  teacher  initiates  the  behavior. 

For  example:  If  a teacher  asks  several 

questions  in  a row,  tally  each  question.  If 
a teacher  gives  feedback,  tally  the  feedback 
given  to  each  student.  If  a teacher  gives 
directions,  tally  each  time  the  teacher 
begins  a new  set  of  directions  or  returns  to 
the  initial  set  of  directions  after 
performing  another  behavior.  When  the 
teacher  is  Monitoring  Activities  or  Quality 
of  Movement,  tally  one  of  the  four  items 
(effective  and  ineffective  included)  each 
time  the  teacher  makes  a statement. 

It  is  important  for  the  observer  to  understand  that 
the  examples  in  this  manual  are  not  exhaustive.  Use  them  as 
a guide  to  the  behaviors  that  can  be  expected  for  each  item. 
If  there  is  still  doubt  about  a behavior,  refer  to  the 
research  supporting  the  instrument.  Remember  that  not  all 
behaviors  a teacher  performs  in  the  classroom  will  be  found 
on  this  instrument. 

During  an  actual  observation,  when  there  is  a doubt 
about  where  to  classify  a behavior,  tally  it  next  to  the 
item  that  initially  comes  to  mind  and  check  the  manual 
and/or  research  document  later  to  confirm  your  tally. 
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INFERENCES 

It  is  important  for  the  observer  to  be  aware  that 
there  are  limitations  in  the  inferences  that  can  be  made 
after  observing  a teacher  using  this  instrument.  These  are 
detailed  below. 

1 . The  behaviors  included  on  this  instrument 
are  not  inclusive  of  all  behaviors  a teacher 
may  demonstrate.  There  are  other  teacher 
behaviors,  both  effective  and  ineffective, 
that  constitute  the  total  reality  of  any 
teacher's  performance.  It  is  important  not 
to  limit  the  teacher  to  the  behaviors  on 
this  instrument  or  to  make  inferences  about 
the  teacher's  ability  on  the  basis  of  one 
observation . 

2.  It  is  important  not  to  assume  that  the 
behaviors  tallied  as  effective  or 
ineffective  were  used  effectively  or 
ineffectively  during  instruction. 

3.  The  instrument  covers  teacher  behaviors 
associated  with  direct  instruction. 

Although  direct  instruction  has  been  shown 
to  be  associated  with  increased  student 
achievement,  other  teaching  methods  are  also 
effective.  Supervisors  should  not  insist 
that  the  techniques  included  on  this 
instrument  are  the  only  correct  ones. 

4.  It  may  be  more  appropriate,  in  some 
situations,  to  evaluate  the  teacher  by  some 
other  means.  Observing  with  this  instrument 
should  be  only  one  method  of  teacher 
evaluation. 

5.  In  making  inferences  about  the  effectiveness 
of  a teacher's  behavior,  it  is  important  to 
keep  the  instability  of  teacher  behavior  in 
mind.  One  of  the  assumptions  associated 
with  the  effective  use  of  an  observation 
instrument  is  that  the  behavior  and  strategy 
of  teachers  remains  constant,  or  stable. 

This  has  not  proven  to  be  the  case.  Some 
researchers  concluded  that  factors  such  as 
the  time  of  day,  grade  level,  and  day  of  the 
week  had  a negligible  influence  on  the 
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teacher's  instructional  pattern,  yet  others 
concluded  that  there  was  as  much  variability 
for  each  teacher  within  a subject  as  between 
different  teachers  of  that  subject. 
Variability  of  the  object  of  observation  has 
been  listed  as  the  most  important  source  of 
error  variance.  It  is  of  critical 
importance,  in  teacher  observation,  to 
sample  teacher  behavior  meticulously.  It  is 
better  to  increase  the  number  of 
observations  rather  than  the  number  of 
observers.  In  one  study,  increasing  the 
number  of  observers  from  1 to  12  was  shown 
to  be  only  half  as  effective  as  using  the  12 
observers  individually  on  12  different 
occasions . 


CAUTIONS 

1 . An  observer  might  have  a hidden  agenda  for  the  use 
of  the  instrument,  and  miscode  teacher  behaviors 
improperly  because  of  the  interpretation  he  or  she 
places  on  the  behavior.  A simple  way  of  putting 
this  is  that  the  same  behavior  can  be  viewed  as  good 
or  bad  depending  upon  the  attitude  of  the  observer. 
Be  sure  to  look  for  behaviors  that  have  been 
described  in  the  manual  and  do  not  read  into  the 
teacher's  behavior. 

2.  There  is  another  observer  effect  known  as  the  halo 
effect . This  means  that  if  the  observer  likes  the 
person  being  observed,  then  all  behaviors  are  seen 
as  good.  There  is  a tendency  to  either  ignore,  or 
miscode,  behaviors  that  are  not  seen  as  effective. 

3.  The  observer  might  have  his  or  her  own  idea  of  how 
something  should  be  done  and  be  looking  for  that 
behavior  only. 

4.  After  a period  of  time  watching  a lesson,  an 
observer's  attention  will  wander.  It  is  important 
to  keep  alert  during  the  observation  and  focused  on 
what  is  going  on  during  the  lesson. 

5.  After  a period  of  time  (days,  weeks,  months,  or 
years),  the  accuracy  of  an  observer  on  an  instrument 
will  generally  be  lowered  due  to  several  reasons. 
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This  is  called  coder  drift.  Some  of  the  reasons  for 
coder  drift  might  include: 

a.  Lack  of  use  by  the  observer 

b.  The  observer's  developing  his  or  her  own 
rules  for  coding  borderline  or  unusual 
behaviors 

Even  though  observers  are  trained  to  a high  level  of 
agreement  during  the  training  session,  the  problem 
of  coder  drift  will  eventually  arise.  Periodic 
updating  or  retraining  sessions  must  be  held  so  that 
all  observers  are  associating  the  teacher's 
behaviors  with  the  appropriate  indicator  on  the 
instrument . 


OTHER  COMMENTS 

Teacher  behavior  and  observer  behavior  will  vary  due 
to  contextual  situations:  physical,  social,  behavioral, 

temporal,  surroundings.  The  way  the  observers  or  teachers 
feel,  the  teaching  situation  in  which  they  find  themselves, 
the  relationship  between  the  two  persons  involved,  the  time 
of  the  day,  the  type  of  student  population  in  a particular 
class,  and  many  other  situations  are  involved  in  this  aspect 
of  observation.  It  is  important  for  an  observer  to  realize 
that  behaviors  do  change  depending  upon  these,  and  other 
factors,  and  that  the  observer  guard  against  making 
inferences  that  may  not  be  true  of  the  teacher's  behavior  in 
all  situations. 

The  reliability  of  this  instrument  has  not  been 
established  at  this  time.  The  usability  of  the  instrument 
has  been  established. 

Although  it  is  important  for  you  to  follow  the 
coding  rules  that  have  been  established  so  that  a level  of 
consistency  can  be  assured,  much  more  leeway  is  available  to 
the  observer  when  using  a formative  observation  instrument 
since  feedback,  not  evaluation,  is  the  purpose  of  the 
formative  observation.  Whatever  modifications  make  the 
instrument  more  useful  are  permissible  so  long  as  they  do 
not  add  something  extraneous  to  the  system. 

It  would  be  appropriate  to  use  this  instrument  once 
or  twice  a month.  An  observation  should  last  an  entire 
period  in  order  to  get  a complete  picture  of  the  lesson.  A 
post-conference  is  a must  after  each  observation  (a  sample 
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post-conference  form  is  included  at  the  end  of  the  Coding 
Manual).  To  improve  a teacher's  instructional  behavior, 
visitations  to  other  schools  with  an  effective  program  and 
participation  in  appropriate  inservice  training,  in  addition 
to  the  use  of  this  instrument  and  the  accompanying 
conferences,  would  be  of  value. 

Costs  that  will  be  incurred  through  the  use  of  the 
instrument  relate  to  the  value  placed  on  the  time  used  by 
the  observers  during  the  training  and  observing  processes, 
the  purchase  of  training  materials,  and  any  other  training 
costs.  The  training  packet  can  be  purchased  reasonably  from 
the  developer.  Other  training  costs  will  depend  upon  the 
method  chosen  for  training  at  the  individual  site.  There 
may  be  a cost  for  the  trainer,  released  time  for 
participants,  facility,  and  equipment  rental. 

It  is  important  to  remember  that,  although  observers 
can  be  trained  quite  well  with  carefully  designed 
instructional  packages,  there  is  no  guarantee  that  those  who 
perform  well  on  a criterion  test  will  perform  well  as  coders 
in  the  field. 

Even  though  each  of  these  indicators  was  selected 
because  they  correlated  in  some  manner  with  increased 
student  achievement,  care  must  still  be  taken  in 
interpreting  the  number  of  tallies  as  representing  effective 
teaching  due  to  the  fact  that  there  are  too  many  variables 
involved  in  a decision  of  that  type. 

Although  the  use  of  this  instrument  can  provide  a 
picture  of  the  teacher  and  his  or  her  activity,  it  is  also 
important  to  focus  on  student  activity  within  the  same 
educational  setting.  Student  involvement  in  the  subject 
matter  is  critical  for  increased  achievement  and,  although 
many  of  the  teacher  behaviors  on  this  instrument  have  been 
shown  to  increase  student  on-task  behavior,  there  remains  a 
possibility  that  a significant  number  of  students  in  the 
class  of  a teacher  who  has  many  tallies  on  the  effective 
side  of  this  instrument,  may  be  off-task.  For  a more 
comprehensive  evaluation,  the  supervisor  should  also  look  at 
student  on- task-as-stated  behavior. 

Instability  of  teacher  behavior  from  occasion  to 
occasion  provides  one  source  of  observational  error. 

Another  source  is  due  to  the  variance  between,  and  within, 
observers.  Human  raters  seldom  agree  totally  about  an 
observation.  This  can,  however,  be  an  advantage  in  looking 
at  a teacher's  behavior.  The  use  of  multiple  observers  will 
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assure  that  a truer  picture  of  that  teacher  emerges.  The 
implication  is  that  greater  improvement  will  result  from  a 
more  comprehensive  system  of  observation. 

In  using  this,  or  any  teacher  observation  instrument 
based  upon  behavior,  a concern  is  that  voiced  by  researchers 
termed  functionalists , who  have  been  attempting  to  record 
behaviors  in  terms  of  their  functions  (i.e.,  what  occurs 
after  they  are  emitted)  rather  than  their  format  response 
properties.  They  believe  that  a teacher  action  is  only 
important  as  it  stimulates  appropriate  student  response,  and 
that  they  argue  that  a misleading  picture  of  what  is 
actually  occurring  in  the  classroom  will  be  obtained  if  an 
observer  only  attends  to  teacher  behavior,  isolating  it  from 
student  response. 

This  instrument  would  be  an  effective  tool  for 
formative  data  collection  only  when  combined  with  feedback 
strategies  such  as  follow-up  conferences.  It  could  provide 
a continuous,  observable,  process  of  staff  development  for 
the  improvement  of  teaching.  It  is  not  appropriate  for  a 
summative  evaluation.  When  the  termination  of  a teacher  is 
the  point  of  an  evaluation,  this  instrument  should  only  be 
part  of  a series  of  evaluations  that  may  include  other 
methods  such  as  indirect  measures,  interviews,  competency 
tests,  classroom  observations,  student  ratings,  and  peer 
review. 


SKILLS: 

DEMONSTRATION,  PRACTICE,  AND  GAMES 


Users  of  this  instrument  should  note  that  it  was 
designed  to  be  used  with  physical  education  activity  classes 
and  not  in  a typical  classroom  setting.  Activity  classes 
may  take  on  many  formats.  In  some  classes  students  may 
learn  skills  through  demonstration  and  practice.  Other 
classes  may  concentrate  on  drills  and  scrimmages  (short 
game-like  practices) . Still  other  classes  may  play  the  game 
that  is  the  terminal  objective  of  the  unit  of  study  (e.g., 
basketball,  softball,  hockey).  No  matter  which  lesson 
formats  is  being  presented  by  the  teacher,  the  instruction 
is  more  effective  if  there  is  a specific  goal. 

One  of  the  commonly  accepted  major  goals  of  physical 
education  is  "to  increase  the  degree  to  which  students  like 
to  do  physically  active  motor  play;  that  is,  to  teach  them 
to  love  the  subject  matter"  (Siedentop,  1983).  This  goal 
conveys  the  connotation  that  physical  education  lessons 
should  be  enjoyable.  In  many  cases  when  a game  is  played, 
the  physical  education  teacher  has  assumed  that  for  it  to  be 
truly  enjoyable  students  should  not  be  held  accountable  for 
the  achievement  of  specific  objectives.  This  assumption  is 
not  valid,  according  to  Siedentop. 

How  often  have  you  seen  students  participate  in 
bump,  pass,  setting,  and  serving  drills,  only  to 
observe  no  evidence  of  those  skills  once  a 
volleyball  game  begins?  . . . They  want  to  have 

fun,  and  often  they  are  given  no  instructions  other 
than  to  play  the  game.  The  fault  [with  this  lack  of 
skill]  lies  in  the  instructional  system.  Games  can 
be  directed  toward  the  achievement  of  specific 
objectives  and  still  be  fun.  . . . Objectives  for 

game  play  call  the  students'  attention  to  important 
instructional  goals.  This  tends  to  make  the  game 
play  considerably  more  educational,  and  in  the  long 
run  more  fun.  . . . These,  of  course,  are  examples 

of  applying  tasks.  ...  In  many  activities, 
scrimmage  and  game  play  represent  the  really 
important  application,  the  one  from  which  the 
students  will  derive  most  satisfaction.  (p.  186) 

The  teacher  should  consider  a game  as  a practice 
session — one  that  uses  the  skills,  learned  in  discrete 
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demonstrations  and  drills,  in  a holistic  manner.  Because  of 
this,  the  observer  should  expect  to  see  teacher  behaviors  in 
all  of  the  categories  (e.g.,  setting  of  specific  goals  for 
the  game,  demonstrations  when  appropriate,  feedback  being 
given,  monitoring  the  movement  quality  of  the  goals 
established  for  the  game) . 
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RESEARCH  SUMMARY 


A survey  of  research  has  indicated  that  teachers 
can,  and  do,  affect  the  achievement  of  their  students  and 
that  teacher  behaviors  can  have  positive  or  negative  effects 
on  student  behavior  and  achievement  in  an  activity 
situation.  The  amount  of  time  spent  on  skills  instruction 
and  practice  had  a strong  influence  on  the  quality  of 
student  achievement.  In  some  studies,  the  researchers  found 
students  spending  as  much  as  two-thirds  of  their  class  time 
waiting  to  participate.  Academic  Learning  Time-Physical 
Education  was  affected  primarily  by  the  nature  of  the 
activity,  the  amount  of  activity  time  available,  and  the 
efficient  use  of  activity  time.  Further,  it  was  the  amount 
of  appropriate  student  involvement  with  the  subject  matter 
that  was  positively  related  to  increases  in  student  learning 
in  physical  education  classes  and  in  the  acquisition  of 
motor  skills  and  not  simply  the  amount  of  time  scheduled  for 
physical  education. 

The  need  for  student  time-on-task , yet  the 
requirement  that  the  time  be  spent  on  appropriate  activities 
pointed  up  the  need  for  effective  use  of  specific  teacher 
behaviors  that  had  been  shown  to  affect  the  amount  of  time 
students  were  on-task-as-stated  during  a lesson.  It  has 
been  demonstrated  that  the  time  spent  by  the  teacher  on 
managerial  episodes  reduced  the  amount  of  time  available  for 
activity  time.  In  one  study,  the  time  spent  in  managerial 
activity  was  reduced  from  over  10  minutes  during  a 35-minute 
period,  to  1 minute  or  less.  The  training  introduced  a 
series  of  strategies  involving  specific  teacher  behaviors 
such  as  beginning  the  class  on  time,  taking  roll  within  one 
minute  and  using  enthusiasm,  hustle,  and  verbal  reminders  of 
appropriate  behavior.  Another  study  emphasized  the 
importance  of  teacher  monitoring  behavior  in  keeping 
students  on-task.  The  data  indicated  that  the  amount  of 
active  or  passive  supervision  engaged  in  by  the  instructor 
was  a key  factor  in  maintaining  student  task  involvement. 

The  need  for  teachers  to  have  equipment  set  up  with  a 
minimum  of  lost  time,  papers  handed  out  quickly,  and  other 
routine  tasks  done  promptly  was  also  reported. 

Based  on  the  research  reviewed,  the  use  of  goals  and 
objectives,  reviews  of  previously  learned  material, 
effective  use  of  questions  to  check  student  comprehension, 
and  not  digressing  from  the  subject  matter,  all  contributed 
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to  the  structure  that  an  effectively  planned  lesson  must 
have.  Although  the  research  findings  related  to  the  value 
of  goals  and  objectives  demonstrated  some  inconsistency 
across  studies,  all  studies  affirmed  that  the  use  of 
specific  goals  increased  on-task  student  behavior  and  the 
level  of  achievement.  Review  of  subject  matter  at  the 
beginning  of  new  lessons,  and  at  monthly  and  weekly 
intervals,  and  at  the  end  of  each  lesson,  was  demonstrated 
to  be  an  effective  instructional  variable. 

From  the  studies  it  appeared  that  verbal  guidance  by 
the  teacher  could  accelerate  the  learning  of  motor  skills, 
but  that  it  should  not  interfere  with  needed  practice  time. 
Lecture  sessions  should  be  brief  and  interspersed  between 
opportunities  for  motor  responses. 

Data  supported  the  profitability  of  demonstrations 
in  the  learning  of  motor  skills.  There  were  specific 
teacher  behaviors,  however,  that  could  affect  the  efficacy 
of  a demonstration.  These  behaviors,  among  others,  included 
the  following:  demonstrating  at  the  beginning  of  a learning 

task,  presenting  correct  demonstrations,  sequencing 
demonstrations  appropriately,  using  verbal  cues,  controlling 
input  cues,  checking  student  comprehension,  and  covering 
safety  points. 

Practice  was  a key  component  in  learning  motor 
skills,  both  simple  and  complex.  The  studies  indicated 
that,  although  specific  instruction  resulted  in  the  learning 
of  skills,  without  practice  there  was  practically  no 
improvement.  It  is  important  to  note,  however,  that 
learning  was  related  to  the  number  of  trials  attempted  by  a 
performer  and  not  simply  to  the  amount  of  time  spent  in  the 
activity.  Research  data  revealed  that  teachers  should  break 
up  their  instructional  time  (talking,  lecturing, 
demonstrating)  with  many  opportunities  for  motor  responses, 
thus  limiting  the  amount  of  verbal  information  that  students 
have  to  process  at  any  one  time  and  increasing  the  number  of 
motor  skill  trials  students  can  engage  in.  Practices  were 
more  effective  when  the  teacher  stated  goals  for  the 
practice  and  monitored  the  movement  quality  of  the  practice. 
The  importance  of  task  specificity  during  practice  was 
asserted  and  the  need  for  skill  practice  to  match  closely 
the  situation  in  which  the  skill  would  be  used  was 
emphasized.  The  use  of  a variety  of  practice  conditions  in 
order  to  reduce  the  possibility  of  error  during  game  play 
was  encouraged  by  some  researchers. 
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Research  data  highlight  feedback  (also  called 
knowledge  of  results)  as  the  strongest,  most  important 
variable  related  to  performance  and  learning.  It  was  stated 
that  there  was  no  skill  improvement  without  feedback, 
progressive  improvement  with  feedback,  and  a regression  of 
skill  when  feedback  was  no  longer  given.  Five 
generalizations  were  reported.  First,  skill  improvement 
would  be  more  rapid  and  a higher  performance  level  would 
result  if  feedback  was  specific.  Second,  feedback  increased 
the  rate  and  level  of  learning.  Third,  the  motivation  level 
of  learners  was  affected  by  feedback,  usually  in  a positive 
manner.  Fourth,  if  feedback  were  delayed,  it  had  less 
effect  than  feedback  given  in  a timely  manner.  Fifth, 
performance  decreased  when  knowledge  of  performance  was 
decreased.  Although  the  type  of  feedback  that  was  given 
could  be  varied,  it  was  important  that  it  be  accurate, 
specific,  and  complete,  and  that  the  student  or  students 
understand  it.  The  value  of  success-oriented  feedback 
rather  than  failure-oriented  feedback  was  reported. 
Researchers  found  that  feedback  affected  males  and  females 
differently:  The  self-confidence  of  the  women  was  found  to 

be  negatively  affected  by  failure  feedback.  Finally,  the 
level  of  expectation  a teacher  had  for  a student  was  shown 
to  be  extremely  influential  in  the  performance  level  of  that 
student.  Student  achievement  was  higher  when  the  teacher 
expected  quality  movement  from  the  students  and  not  simply 
good  attempts.  Teachers  were  found  to  communicate  their 
expectations  through  verbal  forms  of  communication  and  such 
overt,  nonverbal  means  as  nods,  winks,  smiles,  and  pats  on 
the  back.  Less  overt  actions  could  include  selecting  a 
student  for  a leadership  role,  rotating  students  so  that 
everyone  had  an  opportunity  to  play  all  positions,  or  using 
ideas  in  class  that  were  proposed  by  students. 


EFFECTIVE  USE  OF  TIME 


PRINCIPLE:  If  the  teacher  is  efficient  in 

the  use  of  instructional  time  by  beginning 
and  ending  class  on  time,  organizing  so  as 
to  spend  limited  time  on  administrative 
duties,  keeping  students  involved  in  content 
activities,  and  monitoring  class  activities, 
achievement  may  increase. 
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DEFINITIONS : 
EFFICIENT  USE  OF  TIME 


BEGINS  LESSON  PROMPTLY/ 
EFFICIENTLY 

Teacher  begins  lesson 
promptly . 


PROVIDES  ACTIVITIES/GIVES 
DIRECTIONS 

Teacher  organizes  the 
class  and  provides 
directions  that  keep  the 
lesson  moving,  thus 
eliminating  any  slowdown  in 
the  pace  of  the  lesson.  No 
student  or  group  of 
students  wait  for  teacher 
direction  to  be  involved  in 
classwork. 


MATERIALS  IN  ORDER/TIME 
MINIMAL  FOR  DISTRIBUTION 

Teacher  has  all 
equipment  at  the  site  and 
distributes  it  in  an 
efficient  manner.  All 
courts/field  areas  are 
marked  and  nets,  etc.  are 
in  place  when  class  is  to 
begin . 


ACTIVELY  MONITORS 
ACTIVITIES 

The  teacher  ensures 
that  students  are  actively 
and  appropriately  involved 
in  the  lesson  by  active 
observation  and 
involvement . 


DELAYS  STARTING  THE  LESSON 
HIGH  MANAGEMENT 

Teacher  delays 
instruction . 


DELAYS/STUDENTS  WAIT  FOR 
DIRECTIONS  ASSISTANCE/ 
TURNS/TEACHER  PROLONGS  TALK 

One  or  more  students 
are  not  involved  in  lesson 
activities  due  to  the  need 
for  teacher  directions  or 
assistance . 


MATERIALS  NOT  IMMEDIATELY 
AVAILABLE/DISORGANIZED 

Students  must  wait  to 
participate  because  the 
equipment  is  not  at  the 
lesson  site  or  the  playing 
are  must  be  set  up. 


INEFFECTIVE  MONITORING  OF 
ACTIVITIES 

The  teacher  does  not 
become  actively  involved  in 
ensuring  that  students  are 
actively  and  appropriately 
involved  in  the  lesson. 


BEGINS  LESSON  PROMPTLY/EFFICIENTLY 


Tally  in  the  left  column  If  the  teacher: 

— Starts  the  class  on  time  and 
involves  students  in  classwork 
promptly . 

Note:  "Promptly"  may  involve  the 

following:  has  students 

involved  in  classwork 
before  and/or  during 
roll  taking,  or 

Takes  roll  (including 

admits,  etc.)  within  one 
minute  and  begins 
instruction  immediately 
afterwards . 

Tally  this  item  only  once,  at  the 

beginning  of  the  instructional 
period . 


DELAYS  STARTING  THE  LESSON/HIGH 
MANAGEMENT 

Tally  in  the  right  column  if  the 
teacher : 

— Arrives  at  the  class  site  late. 

— Takes  longer  than  one  minute  for 
roll  call  (e.g.,  calls  students' 
names  individually) . 

--Does  not  involve  the  students  in 
instructional  activities 
immediately. 

— Has  trouble  getting  the  students' 
attention  in  order  to  begin 
instruction . 

— Engrosses  in  management 

activities,  thus  delaying  the 
start  of  instruction. 
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APPENDIX  D 
COMPREHENSION  CHECK 


COMPREHENSION  CHECK 


Definitions  and/or  class  vignettes  describing 
categories  and/or  indicators  will  be  presented.  Choose  the 
answer  that  best  describes  the  behavior  that  was  presented. 
You  may  refer  to  a blank  copy  of  the  observation  instrument 


1.  This  is  the  first  day  of  badminton.  Mrs.  Marymount 
demonstrated  the  badminton  serve  right  after  completing 
roll  call.  This  is  an  example  of  which  indicator? 

**  A.  Demonstration  at  beginning  of  skill 
instruction . 

B.  Practice  goals  closely  related  to  lesson. 

C.  Out  of  sequence/no  sequence. 

D.  Demonstration  is  incorrect. 

2.  Coach  Johnson  has  worked  hard  to  develop  good  rapport 
with  his  students.  During  a part  of  each  period  he 
spends  some  time  talking  about  the  activities  the 
students  are  involved  in  after  school  through  the 
recreation  department.  This  is  an  example  of  which 
indicator? 

**  A.  Talks/performs  off  subject. 

B.  Provides  activities  and  attends  students. 

C.  Delays  starting  the  lesson. 

D.  Feedback  is  success-oriented/specific  praise. 

3.  Coach  Hardesty  is  teaching  the  overhead  pass  in 
volleyball.  The  students  first  practice  the  skill 
alone,  then  hit  the  ball  to  one  another  in  a circle 
formation  with  a leader  in  the  center,  and  finally 
perform  the  skill  in  a shuttle  relay  formation  (two 
lines,  one  on  either  side  of  a net).  The  assignment  is 
to  hit  the  ball  back  and  forth  over  the  net.  This  is 
an  example  of: 

A.  Practice  is  massed. 

B.  Demonstration  is  sequenced. 

C.  Talks/performs  on  subject  matter. 

**  D.  Practice  under  varying  conditions. 
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4.  "Environment  and  input  cues  are  controlled"  could  be 
described  best  by  which  situation? 

A.  Mr.  Jacobs  asks  Mr.  Arrow  to  move  his  class 
further  away  in  order  to  have  more  room  for 
the  soccer  game. 

**  B.  It  is  a bright,  hot  day  and  Mr.  Arrow  has  the 
class  gather  under  the  shade  tree  to  discuss 
the  rules  to  be  emphasized  in  today's  class. 

C.  Mr.  Jacobs  says,  "Class,  in  order  to  prevent 
any  tripping,  let's  take  a moment  to  clear  any 
rocks  or  twigs  off  the  field. 

D.  Mr.  Jacobs  says  to  John,  "I  don't  like  it  when 
you  are  talking  during  roll  call." 

5.  Mrs.  Razan  has  taken  roll  and  is  ready  to  start  her 
class  and  sends  a student  back  to  her  office  to  get  the 
softballs.  This  is  an  example  of  which  indicator? 

A.  Delays/students  wait  for  directions  or 
assistance,  teacher  prolongs  talk. 

B.  Delays  starting  the  lesson. 

C.  Talks/performs  off  subject. 

**  D.  Materials  not  immediately  available/ 

disorganized . 

6.  The  students  in  Mr.  Herbert's  class  are  practicing  a 
one-hand  lay-up  shot  in  basketball.  He  is  moving 
around  the  class.  He  watches  Jason  during  two 
attempts,  but  moves  on  without  saying  anything.  This 
is  an  example  of: 

**  A.  No  feedback/student  ignored. 

B.  Actively  monitors  activities. 

C.  No  attempt  to  control  cues  or  environment. 

D.  Feedback  is  delayed. 

7.  Mrs.  Mason  says,  "Class,  I want  each  squad  to  go  to  the 
court  I assigned  them  at  the  beginning  of  the  period. 
When  you  get  there,  I want  each  member  of  the  squad  to 
make  five  good  serves.  Let's  go."  This  is  an  example 
of  which  indicator? 

A.  Practice  is  distributed. 

B.  Movement  quality  is  monitored. 

C.  Practice  under  varying  conditions. 

D.  Directions  are  given. 


* * 


This  teacher  behavior  should  also  be  coded  as: 


A.  Practice  goals  stated. 

B.  Does  not  monitor  quality. 

C.  Orients  students  (goals/objectives). 

D.  Safety  points  are  covered. 


Mr.  Attridge  has  taken  roll,  and  then  says,  "Today 
class,  we  are  going  to  go  over  the  scoring  in  tennis 
and  then  you  will  each  get  a chance  to  play  one  or  two 
games."  This  is  an  example  of: 


A.  Orients  students  (goals/objectives). 

B.  Practice  goals  are  given. 

C.  Conducts  review. 

D.  Talks/performs  on  subject. 


At  the  beginning  of  class,  Mrs.  Boyet  says,  "Yesterday 
we  covered  the  sidearm  serve  in  volleyball.  Today  we 
will  work  on  the  overhand  serve."  This  is  an  example 
of  which  indicator? 


A.  Conducts  review. 

B.  Talks/performs  on  subject. 

C.  High  expectation  is  communicated. 

**  D.  None  of  the  above. 

Mr.  Babbit  has  just  demonstrated  a bowling  delivery. 

At  the  conclusion  of  the  demonstration  he  asks 
questions  such  as,  "How  many  steps  should  you  take?" 
"What  grip  did  I recommend?"  and  "Where  should  you  try 
to  aim  the  ball  when  you  release?"  This  is  an  example 
of  which  indicator? 

**  A.  Student  comprehension  is  checked,  single 
faceted . 

B.  Pauses  after  complex  question. 

C.  Demonstration  is  sequenced. 

D.  Movement  quality  is  monitored. 
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12.  Mrs.  John  says,  "Okay,  class.  Now  that  we've  finished 
our  exercises,  I want  you  to  sit  down  quickly  right  in 
front  of  me.  While  you  are  coming,  think  of  how  many 
players  were  supposed  to  have  been  in  the  backfield." 
One  of  the  places  this  should  be  coded  is: 

A.  Practice  under  varying  conditions. 

**  B.  Provides  activities. 

C.  Demonstration  at  beginning  of  skill 
instruction . 

D.  Conducts  review. 

13.  Which  statement  by  Mrs.  Campbell  would  best  describe 
the  indicator  "accurate/specific/complete"? 

A . "Good  shot . " 

B.  "Better  luck  next  time." 

**  C.  "Keep  your  eye  on  the  ball  as  it  is  coming 
toward  you." 

D.  "Get  back  in  line,  you  are  misbehaving." 

14.  Mr.  Frick  is  beginning  his  basketball  lesson.  His 
warm-ups  include  having  students  jump  as  high  as  they 
can,  running  a weaving  pattern  through  traffic  cones 
while  dribbling  the  ball,  and  fast  throwing  and 
catching  of  a ball  thrown  against  the  wall.  This  is  an 
example  of  which  indicator? 

**  A.  Warm-ups  closely  related  to  lesson. 

B.  Practice  goals  are  given. 

C.  Practice  under  varying  conditions. 

D.  Provides  activities  and  attends  students. 

15.  Mrs.  Emmer  is  monitoring  her  students  as  they  practice 
a skill.  She  observes  Robert  having  trouble  throwing 
the  ball  through  the  hoop.  She  says,  "Robert,  you  are 
pushing  too  hard  with  your  right  hand.  Try  pushing 
exactly  the  same  with  both  hands."  This  could  be 
coded : 

A.  Student  comprehension  is  checked. 

B.  Feedback  is  failure  oriented. 

**  C.  Feedback  is  timely. 

D.  Low  or  no  expectation  level  is  communicated. 


221 


16.  The  behavior  described  above  might  also  be  coded  as: 


* * 

A. 

At  student's  level 

of  understanding. 

B. 

Feedback  is  wrong 

or  imprecise. 

C. 

Talks/performs  on 

subject . 

D. 

There  is  no  other 
this  behavior. 

indicator  appropriate  for 

17.  Coach  Fulton  is  teaching  a beginning  square  dance  to  a 
group  of  fifth-grade  students.  He  starts  out  the  class 
by  saying,  "Yesterday  we  learned  how  to  find  our 
corner.  Can  anyone  tell  me  how  I told  you  to  do  this? 
Mary."  Mary  gives  the  answer  and  coach  says,  "That's 
right,  just  turn  your  back  on  your  partner."  This 
behavior  might  be  coded  as: 

A.  Orients  students. 

B.  Talks/performs  on  subject. 

C.  Student  comprehension  is  checked,  multiple 
questions . 

**  D.  Conducts  review. 

18.  Mrs.  DeBacy  has  just  demonstrated  the  straddle  vault  in 
gymnastics.  She  then  tells  the  class,  "I  am  now  going 
to  teach  all  of  you  how  to  spot  for  vaulting.  It  is 
very  easy  to  vault  wrong  and  injure  yourself.  I never 
want  you  to  vault  without  a spotter.  This  is  an 
example  of: 

A.  Verbal  cues  are  used. 

B.  Demonstration  is  sequenced. 

**  C.  Safety  points  are  covered. 

D.  Low  expectation  is  communicated. 

19.  The  following  phrase  best  communicates  what  is  meant  by 
"High  expectation  is  communicated" : 

A.  "Try  to  get  at  least  eight  baskets." 

**  B.  "You  got  six  baskets  yesterday.  Let's  work 
for  eight  baskets  today." 

C.  "If  you  don't  get  at  least  eight  baskets, 
don't  bother  to  report  to  me." 

D.  "Mary,  I don't  think  you  can  get  eight 
baskets.  Why  don't  you  prove  me  wrong?" 
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20  . 


Which  of  these  phrases  best  communicates  what  is  meant 
by  "Low  or  no  expectation  level  is  communicated"? 

A.  "I  want  you  to  get  at  least  four  baskets." 

**  B.  "Each  squad  go  to  a basket  and  practice 
shooting  free  shots." 

C.  "I  want  you  to  try  for  two  baskets." 

D.  "You  got  two  baskets  yesterday.  Let's  work 
for  three  baskets  today." 


21.  Mrs.  Dey,  upon  observing  Sandra  missing  the  softball 

during  batting  practice,  says,  "Try  again."  This  is  an 
example  of : 


A.  No  feedback/student  is  ignored. 

B.  Feedback  is  timely. 

C.  Feedback  is  success  oriented. 

D.  Feedback  is  wrong  or  imprecise. 


22.  As  the  students  are  going  back  to  the  locker  room  at 

the  end  of  class,  Coach  Peterson  says  to  Michael,  "Hey 
Mike,  I was  watching  you  today;  you  need  to  follow 
through  more  on  your  serve."  This  is  an  example  of: 

**  A.  Feedback  is  delayed. 

B.  Feedback  is  wrong  or  imprecise. 

C.  Feedback  is  failure  oriented. 

D.  High  expectation  level  is  communicated. 


23.  Coach  Reeve  takes  roll,  demonstrates  the  backstroke, 
and  then  says,  "Okay,  let's  get  in  the  water  and 
practice  the  backstroke."  This  is  an  example  of: 


A.  Practice  goals  are  given. 

B.  No  practice  goals  stated. 

C.  Practice  under  varying  conditions. 

D.  Practice  is  distributed. 


24.  Mrs.  Moxley  is  teaching  the  dribble  in  basketball.  Her 
goal  for  the  day  is  to  get  her  students  to  dribble  with 
either  hand.  Her  lesson  is  structured  in  the  following 
manner.  First,  she  has  the  students  run  down  the 
length  of  the  court  and  back  weaving  in  and  out  of 
traffic  cones.  She  then  has  the  students  go  through 
the  same  cones  running  sideways,  and  changing  sides  at 
each  traffic  cone.  After  this,  she  gives  them  a 
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basketball  and  has  them  run  sideways  through  the  cones, 
changing  sides  as  before,  and  changing  the  hand  that 
dribbles  the  ball.  This  could  best  be  described  as: 

**  A.  Components/brief  verbal. 

B.  Practice  is  at  high  success  level. 

C.  Movement  quality  is  monitored. 

D.  Practice  goals  are  given. 

25.  Coach  Yando  directs  his  students  to  start  practicing 
their  table  tennis  serves.  After  watching  them  for 
awhile  he  calls  them  together  and  demonstrates  the 
serve.  This  should  be  coded  as: 

A.  Demonstration  is  sequenced. 

B.  Out  of  sequence/no  sequence. 

C.  Demonstration  is  incorrect. 

**  D.  Demonstrates  after  skill  is  attempted. 

26.  A teacher  who  takes  roll  quickly  and  then  immediately 
gets  the  class  involved  should  be  tallied  as: 

A.  Providing  activities  and  attending  students. 

**  B.  Begins  lesson  promptly. 

C.  Actively  monitors  activities. 

27.  During  an  archery  demonstration  the  teacher  says,  "arm 
up  . . . elbow  back  . . . string  to  chin  . . . anchor 

. . . release  . . . hold."  This  is  an  example  of: 

A.  Demonstration  is  sequenced. 

**  B.  Verbal  cues  are  used. 

C.  Demonstration  is  correct. 

D.  Talks/performs  on  subject. 

28.  If  the  playing  field  is  marked  off  before  the  class 
begins,  the  observer  should  tally: 

**  A.  Materials  in  order. 

B.  Begins  classwork  promptly. 

C.  Provides  activities. 
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29.  An  example  of  sequencing  a golf  skill  would  be: 


A.  To  demonstrate  a putt,  then  practice  the  putt, 
next  demonstrate  a chip  and  then  practice  the 
chip,  etc. 

B.  To  arrange  the  class  so  that  one  person  does 
the  movement  sequence  one  at  a time. 

**  C.  To  demonstrate  the  grip,  then  practice  it. 

Next  the  stance,  then  practice  it,  then  the 
backswing,  the  swing,  and  finally  the  follow- 
through  . 

30.  An  example  of  "Movement  Quality  Is  Monitored"  would  be 
when : 


A. 

(30) **  B. 

(31) **  C. 


The  teacher  comments  to  the  students  that  they 
are  not  lining  up  as  quickly  as  they  should. 
The  teacher  says  to  a student  (during  a 
practice  session  on  the  breaststroke)  "your 
glide  is  too  short,  make  it  longer." 

The  teacher  observes  the  students  playing  and 
makes  general  comments  to  keep  students  on- 
task. 


31.  Which  of  the  above  is  an  example  of  "Actively  Monitors 
Activities"? 


32.  During  a soccer  scrimmage,  the  teacher  says,  "Who  do 
you  think  the  ball  should  be  thrown  to  from  out-of- 
bounds?  Who  is  in  the  best  position  to  make  a play?  " 
This  is  an  example  of: 


A.  Single  faceted. 

**  B.  Multiple  questions. 

C.  Solicits  immediate  response. 

33.  The  teacher  wants  to  check  the  students'  forehand 

tennis  strokes  before  going  on  to  another  stroke.  He 
has  them  line  up  in  front  of  him  and  perform  the  stroke 
one  at  a time.  This  should  be  coded  as: 

A.  Practice-play-practice. 

B.  Practice  is  at  high  success  level. 

C.  High  expectation  level  is  communicated. 

D.  Students  listen/wait  to  participate. 


* * 


APPENDIX  E 
SURVEYS 


SURVEY  OF  TRAINING  MATERIALS 


The  purpose  of  this  survey  is  for  you  to  evaluate 
the  training  materials  used  in  this  observation  system. 
Unless  otherwise  stated,  consider  all  materials  used  in  the 
training  process . 

Please  place  the  number  of  your  answer  on  the  line 
immediately  to  the  left  of  the  statement.  Please  respond 
according  to  the  following  code: 

1 —  Strongly  agree 

2 —  Agree 

3 —  Undecided 

4 —  Disagree 

5 —  Strongly  disagree 

************************************************************ 

Identifying  Criterion 

The  identifying  criterion  contains  information  which 
enables  the  user  to  decide  if  a specific  instrument  is 
appropriate  for  his  purpose  and  application. 

1.  The  manual  had  a title. 


2.  The  manual  had  a statement  of  purpose. 

3.  There  was  confusion  as  to  what  the  materials 
were  to  be  used  for. 

4.  You  were  given  information  on  the  support 
underlying  the  instruction. 

5.  You  were  given  information  on  the  behaviors  on 
which  the  instrument  is  focused. 

6.  The  applications  for  which  the  instrument  is 
intended  were  specified. 

7.  Any  situations  for  which  the  instrument  should 
not  be  used  were  specified. 
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Validity  Criterion 

The  validity  criterion  contains  information  on 

inference,  context,  and  reliability. 

8.  All  items  on  the  instrument  were  as  clearly  and 

unambiguously  defined  as  possible. 

9.  Examples  helped  to  eliminate  any  overlap  of 

behaviors . 

10.  The  definitions  were  consistent  with  their  use 

for  the  theory  which  they  represent. 

11.  Indicators  on  the  instrument  exhausted  the 

dimension(s)  of  teacher  behavior  (Instructional 
Organization  and  Development)  as  much  as  was 
practical ) . 

12.  The  indicators  were  not  defined  as  clearly  as 

possible . 

13.  Items  listed  on  the  instrument  were  as  low  in 

the  degree  of  observer  inference  required  as 
possible,  taking  into  account  the  complexity  of 
the  teacher  behavior  under  study. 

14.  The  nature  and  extent  of  observer  inference  and 

methods  of  reducing  and/or  controlling  it  were 
explained . 

15.  No  information  on  the  development  of  the 

instrument  and  training  materials  was  given. 

16.  The  problem  of  context  (i.e.,  physical,  social, 

behavioral,  temporal  surroundings)  was 
explained . 

17.  The  materials  were  presented  with  no 

explanation  as  to  their  use. 

18.  Observer,  and  other  effects,  were  explained. 


19.  Information  on  the  methods  employed  to  test 
validity  of  the  instrument,  the  results 
obtained,  and  the  purpose  for  which  these 
results  apply,  was  presented. 
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Criterion  of  Practicality 

This  criterion  is  concerned  with  the  ease  of 

implementation  of  a given  system,  its  acceptability  to  those 

under  study,  the  complexity  of  the  data-gathering  mechanisms 

required,  the  training  procedures  entailed,  etc. 

20.  Indicators  on' the  instrument  were  relevant  for 

physical  education  activity  classes. 

21.  The  coding  method  was  simple,  easy  to  remember, 

and  convenient  to  use. 

22.  Copies  of  the  instrument,  manuals,  tapes,  and 

other  materials  were  not  available. 

23.  Categories  and  the  coding  method  were  easily 

learned . 

24.  Items  where  special  qualifications  of  observers 

are  required  were  made  clear. 

25.  Necessary  manuals,  tapes,  films,  or  other 

training  devices  were  easily  available. 

26.  The  manual  contained  recommendations  as  to  the 

number,  location,  and  functions  of  observers 
and  other  staff  needed  in  the  observation 
setting  and  elsewhere. 

27.  Materials  were  given  out  with  no  explanation  as 

to  how  to  score  them  and  analyze  results. 

28.  Procedures  for  analyzing  data  were  described 

and  discussed. 

29.  Costs  likely  to  be  incurred  in  the  use  of  the 

instrument  were  covered. 

30.  Behavioral  samples  used  were  not  relevant  to 

physical  education. 


SURVEY  OF  TRAINER  EFFECTIVENESS 


The  purpose  of  this  survey  is  for  you  to  evaluate 
the  instructional  effectiveness  of  the  trainer  who  conducted 
the  session.  This  is  not  a personality  survey. 

Please  place  the  number  of  your  answer  on  the  line 
immediately  to  the  left  of  the  statement.  Please  respond 
according  to  the  following  code: 

1 —  Strongly  agree 

2 —  Agree 

3 —  Undecided 

4 —  Disagree 

5 —  Strongly  disagree 

************************************************************ 

1.  The  trainer  introduced  him-  or  herself. 


2.  The  trainer  presented  a statement  of  purpose. 


3.  The  trainer  gave  you  information  on  the  support 
underlying  the  instrument. 

4.  The  trainer  gave  you  information  on  the 
behaviors  on  which  the  instrument  is  focused. 

5.  The  trainer  did  not  orient  the  participants  as 
to  the  purpose  of  the  training  and  the 
instrument . 

6.  The  trainer  specified  situations  for  which  the 
instrument  should  not  be  used. 

7.  The  trainer  did  not  discuss  the  behaviors 
described  in  the  indicators. 

8.  The  trainer  did  not  explain  any  applications 
for  which  the  instrument  was  designed. 

9.  The  definitions  used  by  the  trainer,  for  the 
indicators  appeared  consistent  with  the  theory 
they  represent . 
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10.  Ground  rules  for  the  implementation  of  the 
instrument  and  for  the  categorization  of 
borderline  and/or  unusual  behaviors  were 
specified . 

11.  No  ground  rules  for  the  use  of  the  instrument 
nor  for  the  categorization  of  borderline  or 
unusual  behaviors  were  specified. 

12.  The  nature  and  extent  of  observer  inference  and 
methods  of  reducing  and/or  controlling  it  were 
explained . 

13.  The  nature  of  inferences  that  can  be  made  were 
described . 

14.  The  problem  of  context  (i.e.,  physical,  social, 
behavioral,  temporal  surroundings)  was 
explained . 

15.  Observer,  and  other  effects,  were  explained. 

16.  Definitions  of  behavioral  indicators  seemed,  at 
times,  to  be  at  variance  with  the  research  base 
of  the  instrument. 

17.  The  problem  of  context  was  not  explained. 

18.  The  trainer  used  examples  which  were  relevant 
for  physical  education  activity  classes. 

19.  Halo  and  other  observer  effects  were  not 
covered . 

20.  The  development  of  the  instrument  and  manual 
was  not  discussed. 

21.  Categories  and  the  coding  method  were 
explained . 

22.  Items  where  special  qualifications  of  observers 
are  required  were  made  clear. 

23.  The  trainer  had  necessary  manuals,  tapes, 
films,  or  other  training  devices  on  site. 

Data  collection  and  recording  procedures  were 
discussed . 


24  . 


231 


25.  The  observation  unit  recommended  by  the  system 
was  explained  (categories  and  indicators). 

26.  The  coding  unit  (tallies)  recommended  by  the 
system  was  explained. 

27.  Procedures  for  analyzing  data  were  described 
and  discussed. 

28.  Necessary  materials  were  not  available. 

29.  The  trainer  did  not  discuss  the  difference 
between  categories  and  indicators. 


APPENDIX  F 
DATA  TABLES 


Table  F-l 

Measures  Obtained  in  Three  Random  Samples 
after  Conditions  of  No  Training,  Developer  Training, 
and  Second-Level  Training: 

Comprehension  Check 


Group 

A 

Group  B 

Group 

C 

( Control ) : 

( Developer ) : 

(Second-Level) : 

Raw  Score 

Raw  Score 

Raw  Score 

25 

30 

22 

25 

28 

25 

21 

26 

24 

24 

26 

31 

23 

27 

29 

20 

29 

21 

24 

29 

31 

23 

27 

30 

23 

26 

30 

25 

30 

N1  = 

9 

n2  = 10 

N3  = 

10 

Ma  = 

23 

M = 27.3 

M = 

27.3 

SDb  = 

1 . 63 

SD  = 1.60 

SD  = 

3 . 69 

Range  = 

20-25 

Range  = 25-30 

Range  = 

21-30 

Note.  Criterion  score  was  32. 
a0f  accuracy  score. 
b0f  raw  score. 
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Table  F-2 

Measures  Obtained  In  Three  Random  Samples 
after  Conditions  of  No  Training,  Developer  Training, 
and  Second-Level  Training: 

Efficient  Use  of  Time  (Effective) 


Group  A 
( Control ) : 
Raw  Accuracy 

Score  Score 

Group  B 
( Developer ) : 
Raw  Accuracy 

Score  Score 

Group  C 

(Second-Level) : 
Raw  Accuracy 

Score  Score 

10 

9 

7 

12 

10 

9 

6 

13 

9 

10 

23 

4 

3 

16 

8 

11 

14 

5 

1 

18 

9 

10 

18 

1 

12 

7 

9 

10 

11 

8 

2 

17 

17 

2 

12 

7 

2 

17 

15 

4 

23 

4 

4 

15 

17 

2 

23 

4 

2 

17 

14 

5 

23 

4 

8 

11 

13 

6 

N1 

= 9 

n2 

= 10 

N3 

= 10 

Ma 

= 14.33 

M 

= 7.5 

M 

= 5.2 

SDb 

= 3.67 

SD 

= 3.97 

SD 

= 5.29 

Rangeb 

= 1-12 

Range 

= 7-17 

Range 

= 10-23 

Note . Criterion  score  was  19. 
a0f  accuracy  score. 
b0f  raw  score. 
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Table  F-3 

Measures  Obtained  in  Three  Random  Samples 
after  Conditions  of  No  Training,  Developer  Training, 
and  Second-Level  Training: 

Efficient  Use  of  Time  (Ineffective) 


Group  A 

Group  B 

Group  C 

( Control ) : 

( Developer ) : 

( Second-Level ) : 

Raw  Accuracy 

Raw  Accuracy 

Raw  Accuracy 

Score  Score 

Score  Score 

Score  Score 

0 2 

4 

2 

1 1 

0 2 

7 

5 

3 1 

3 1 

4 

2 

7 5 

2 0 

5 

3 

3 1 

1 1 

5 

3 

2 2 

3 1 

7 

5 

4 2 

1 1 

5 

3 

4 2 

2 0 

4 

2 

2 0 

5 3 

6 

4 

3 1 

4 

2 

7 5 

N!  = 9 

n2  = 

10 

n3  = 10 

Ma  = 1.22 

M = 

3 . 

1 

M = 2.0 

SDb  = 1.52 

SD  = 

1 . 

44 

SD  = 1.86 

Rangeb  = 0-5 

Range  = 

4- 

7 

Range  = 1-7 

Note . Criterion  score  was  2. 
a0f  accuracy  score. 
b0f  raw  score. 
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Table  F-4 

Measures  Obtained  in  Three  Random  Samples 
after  Conditions  of  No  Training,  Developer  Training, 
and  Second-Level  Training: 

Lesson  Development  (Effective) 


Group  A 
( Control ) : 
Raw  Accuracy 

Score  Score 

Group  B 
( Developer ) : 
Raw  Accuracy 

Score  Score 

Group  C 

( Second-Level ) : 
Raw  Accuracy 

Score  Score 

15 

5 

21 

1 

10 

10 

8 

12 

10 

10 

20 

0 

17 

3 

19 

1 

17 

3 

7 

13 

18 

2 

22 

2 

3 

17 

19 

1 

15 

5 

4 

16 

17 

3 

6 

14 

8 

12 

29 

9 

15 

5 

6 

14 

24 

4 

25 

5 

8 

12 

16 

4 

19 

1 

15 

5 

15 

5 

N1 

= 9 

n2 

= 10 

N3 

= 10 

Ma 

= 11.56 

M 

= 4 

M 

= 5 

SDb 

= 4.40 

SD 

= 7.55 

SD 

= 5.30 

Rangeb 

= 3-17 

Range 

= 10-29 

Range 

= 6-25 

Note . Criterion  score  was  20. 
a0f  accuracy  score. 
b0f  raw  score. 
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Table  F-5 

Measures  Obtained  in  Three  Random  Samples 
after  Conditions  of  No  Training,  Developer  Training, 
and  Second-Level  Training: 

Lesson  Development  (Ineffective) 


Group  A 
( Control ) : 
Raw  Accuracy 

Score  Score 

Group  B 
( Developer ) : 
Raw  Accuracy 

Score  Score 

Group  C 

(Second-Level) : 
Raw  Accuracy 

Score  Score 

3 

6 

9 

0 

6 

3 

1 

8 

0 

9 

14 

5 

7 

2 

11 

2 

6 

3 

13 

4 

6 

3 

8 

1 

5 

4 

8 

1 

9 

0 

0 

9 

9 

0 

5 

4 

2 

7 

13 

4 

7 

2 

1 

8 

15 

6 

7 

2 

8 

1 

9 

0 

7 

2 

8 

1 

7 

2 

N1 

= 9 

n2 

= 10 

N3 

= 10 

Ma 

= 5.44 

M 

= 2.6 

M 

= 2.4 

SDb 

= 4.00 

SD 

= 3.84 

SD 

= 2.37 

Rangeb 

= 1-13 

Range 

= 0-15 

Range 

= 5-14 

Note . Criterion  score  was  9. 
a0f  accuracy  score. 
b0f  raw  score. 
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Table  F-6 

Measures  Obtained  in  Three  Random  Samples 
after  Conditions  of  No  Training,  Developer  Training, 
and  Second-Level  Training: 
Demonstrations  (Effective) 


Group  A 
( Control ) : 
Raw  Accuracy 

Score  Score 

Group  B 
( Developer ) : 
Raw  Accuracy 

Score  Score 

Group  C 

(Second-Level) : 
Raw  Accuracy 

Score  Score 

13 

8 

5 

16 

22 

1 

14 

7 

4 

17 

18 

3 

1 

20 

9 

12 

5 

16 

2 

19 

10 

11 

6 

15 

6 

15 

9 

12 

14 

7 

10 

11 

15 

6 

11 

10 

4 

17 

5 

16 

15 

6 

10 

11 

13 

8 

8 

13 

2 

19 

12 

9 

12 

9 

9 

12 

6 

15 

N1 

= 9 

n2 

2= 

10 

N3 

= 10 

Ma 

= 14.11 

M 

= 

11.9 

M 

= 9.5 

SDb 

= 4.70 

SD 

= 

3 . 45 

SD 

= 5.35 

Rangeb 

= 1-14 

Range 

= 

4-15 

Range 

= 5-22 

Note . Criterion  score  was  21. 
a0f  accuracy  score. 
b0f  raw  score. 
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Table  F-7 

Measures  Obtained  in  Three  Random  Samples 
after  Conditions  of  No  Training,  Developer  Training, 
and  Second-Level  Training: 
Demonstrations  (Ineffective) 


Group  A 

Group  B 

Group  C 

( Control ) : 

( Developer ) : 

(Second-Level) : 

Raw  Accuracy 

Raw  Accuracy 

Raw  Accuracy 

Score  Score 

Score  Score 

Score  Score 

2 2 

0 

0 

0 

0 

0 0 

1 

1 

0 

0 

1 1 

1 

1 

4 

4 

2 2 

1 

1 

0 

0 

0 0 

1 

1 

5 

5 

1 1 

1 

1 

0 

0 

3 3 

2 

2 

2 

2 

0 0 

0 

0 

1 

1 

4 4 

1 

1 

0 

0 

0 

0 

4 

4 

N1  " 9 

n2  = 

10 

N3  " 

10 

Ma  = 1.44 

M = 

0 . 8 

M = 

1.6 

SDb  = 1.34 

SD  = 

0.6 

SD  = 

1 .91 

Rangeb  = 0-4 

Range  = 

0-2 

Range  = 

0-5 

Note . Criterion  score  was  0. 
a0f  accuracy  score. 
b0f  raw  score. 
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Table  F-8 

Measures  Obtained  in  Three  Random  Samples 
after  Conditions  of  No  Training,  Developer  Training, 
and  Second-Level  Training: 

Supervised  Performance  (Effective) 


Group  A 

Group  B 

Group  C 

( Control ) : 

( Developer ) : 

(Second-Level) : 

Raw  Accuracy 

Raw  Accuracy 

Raw  Accuracy 

Score  Score 

Score  Score 

Score  Score 

15  2 

11 

6 

10 

7 

12  5 

6 

11 

19 

2 

4 13 

12 

5 

14 

3 

2 15 

5 

12 

7 

10 

4 13 

3 

14 

7 

10 

1 16 

6 

11 

13 

4 

0 17 

11 

6 

7 

10 

4 13 

22 

5 

10 

7 

2 15 

12 

5 

14 

3 

8 

9 

1 

16 

N:  = 9 

n2  = 

10 

N3  = 

10 

Ma  = 12.1 

M = 

8.4 

M = 

7.2 

SDb  =4.84 

SD  = 

5 .12 

SD  = 

4 . 79 

Rangeb  = 0-15 

Range  =• 

3-22 

Range  = 

1-19 

Note . Criterion  score  was  17. 
a0f  accuracy  score. 
b0f  raw  score. 
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Table  F-9 

Measures  Obtained  in  Three  Random  Samples 
after  Conditions  of  No  Training,  Developer  Training, 
and  Second-Level  Training: 

Supervised  Performance  (Ineffective) 


Group  A 
( Control ) : 
Raw  Accuracy 

Score  Score 

Group  B 
( Developer ) : 
Raw  Accuracy 

Score  Score 

Group  C 

( Second-Level ) : 
Raw  Accuracy 

Score  Score 

1 

1 

1 

1 

0 

2 

0 

2 

4 

2 

1 

1 

4 

2 

6 

4 

1 

1 

1 

1 

3 

1 

0 

2 

1 

1 

8 

6 

0 

2 

2 

0 

5 

3 

0 

2 

0 

2 

3 

1 

2 

0 

1 

1 

4 

2 

0 

2 

7 

5 

2 

0 

0 

2 

2 

0 

4 

2 

N1 

= 9 

n2 

= 10 

N3 

= 10 

Ma 

= 1.7 

M 

= 2.0 

M 

= 1.6 

SDb 

= 2.08 

SD 

= 1.99 

SD 

= 1.25 

Range*3 

= 0-7 

Range 

= 1-8 

Range 

= 0-4 

Note . Criterion  score  was  2. 
a0f  accuracy  score. 
b0f  raw  score. 
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Table  F-10 

Measures  Obtained  in  Three  Random  Samples 
after  Conditions  of  No  Training,  Developer  Training, 
and  Second-Level  Training: 

Feedback  (Effective) 


Group  A 
( Control ) : 
Raw  Accuracy 

Score  Score 

Group  B 
( Developer ) : 
Raw  Accuracy 

Score  Score 

Group  C 

(Second-Level) : 
Raw  Accuracy 

Score  Score 

10 

6 

3 

1 

4 

0 

9 

5 

4 

0 

9 

5 

2 

2 

11 

7 

3 

1 

3 

1 

6 

2 

1 

3 

2 

2 

5 

1 

8 

4 

2 

2 

5 

1 

3 

1 

2 

2 

6 

2 

10 

6 

5 

1 

24 

20 

2 

2 

2 

2 

3 

1 

20 

16 

1 

3 

1 

3 

N1 

= 9 

n2 

= 10 

N3 

= 10 

Ma 

= 2.6 

M 

= 3.8 

M 

= 4.1 

SDb 

= 3.03 

SD 

= 6.26 

SD 

= 5.59 

Rangeb 

= 2-10 

Range 

= 1-24 

Range 

= 1-20 

Note.  Criterion  score  was  4. 
a0f  accuracy  score. 
b0f  raw  score. 
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Table  F-ll 

Measures  Obtained  in  Three  Random  Samples 
after  Conditions  of  No  Training,  Developer  Training, 
and  Second-Level  Training: 

Feedback  (Ineffective) 


Group  A 
( Control ) : 
Raw  Accuracy 

Score  Score 

Group  B 
(Developer) : 
Raw  Accuracy 

Score  Score 

Group  C 

(Second-Level) : 
Raw  Accuracy 

Score  Score 

3 

4 

4 

3 

4 

3 

0 

7 

2 

5 

8 

1 

3 

4 

7 

0 

6 

1 

1 

6 

3 

4 

7 

0 

2 

5 

4 

3 

3 

4 

0 

7 

2 

5 

3 

4 

1 

6 

5 

2 

5 

2 

2 

5 

6 

1 

5 

2 

4 

3 

2 

5 

7 

0 

5 

2 

12 

5 

N1 

= 9 

n2 

= 10 

N3 

= 10 

Ma 

= 5.2 

M 

= 3.0 

M 

= 2.2 

SDb 

= 1.32 

SD 

= 1.67 

SD 

= 1.41 

Range*3 

= 0-4 

Range 

= 2-7 

Range 

= 3-12 

Note . 

Criterion 

score  was  7 

. 

a0f  accuracy  score. 
b0f  raw  score. 
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Table  F-12 

Range  and  Mean  Scores  of  Two  Random  Samples 
after  Training  by  Developer  and 
by  Second-Level  Trainer: 

Survey  on  Training  Materials 
(Positively  Phrased  Questions) 


Question 

Number 

Group  B 
( Developer 
Trained) 
Range  Mean 

Group  C 
( Second-Level 
Trained) : 
Range  Mean 

Grand 

Mean3 

t-test 

(A  score 

of  1 represents 

strong  agreement 

with  a positively 

phrased 

question) 

1 

1-2 

1 . 1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

4 

1-3 

1 . 4 

1-2 

1.3 

1 

5 

1-2 

1 . 1 

1-4 

1 . 6 

1 

1 . 52 

6 

1-2 

1 . 1 

1-2 

1 . 1 

1 

7 

1-2 

1 . 4 

1-4 

1.7 

2 

8 

1-4 

1.9 

1-4 

1 . 8 

2 

9 

1-2 

1 . 5 

1-2 

1 . 4 

1 

10 

1-3 

1 . 5 

1-2 

1 . 5 

2 

11 

1-4 

1.9 

1-3 

1.7 

2 

13 

1-3 

1 . 6 

1-4 

2 . 1 

2 

0.64 

14 

1-4 

1 . 7 

1-2 

1.5 

2 

16 

1-3 

1 . 8 

1-5 

2 . 7 

2 

1 . 58 

18 

1-2 

1 . 3 

1-3 

1.6 

1 

19 

1-5 

2 . 2 

1-5 

2.0 

2 

20 

1-5 

1 . 8 

1-2 

1 . 1 

1 

1 . 27 

21 

1-5 

2 . 6 

1-4 

1 . 8 

2 

1 . 40 

23 

1-4 

2 . 1 

1-4 

2 . 1 

2 

24 

1-3 

2.0 

1-3 

2 . 1 

1 

25 

1-2 

1 . 3 

1-5 

1 . 8 

2 

1.07 

26 

1-3 

2 . 2 

1-4 

2 . 4 

2 

28 

1-4 

1 . 9 

1-4 

1 . 5 

2 

29 

1-2 

1 . 5 

1-5 

4 . 1 

3 

5 . 10 

Note . A t-test  for  independent  samples  was  conducted  when 

the  difference  between  the  mean  of  the  two  groups  was 
.5  or  greater. 


aRounded  off  to  nearest  whole-number  score  for  easy 
comparison . 


^Significant  at  .001. 
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Table  F-13 

Range  and  Mean  Scores  of  Two  Random  Samples 
after  Training  by  Developer  and 
by  Second-Level  Trainer: 

Survey  on  Training  Materials 
(Negatively  Phrased  Questions) 


Group  B 
( Developer 

Question  Trained) 

Number  Range  Mean 


Group  C 
( Second-Level 

Trained) : Grand 

Range  Mean  Mean3 


t-test 


(A  score  of  5 represents  strong  disagreement 


th  a 

negatively 

phrased 

question) 

3 

5-4 

4 . 8 

5-1 

4 . 3 

5 

12 

5-1 

3 . 9 

5-2 

3 . 8 

4 

15 

5-4 

4 . 8 

5-1 

4.0 

4 

1.43 

17 

5 

5 

5-3 

4 . 8 

5 

22 

5 

5 

5-1 

4 . 6 

5 

27 

5-4 

4 . 8 

5-1 

4.0 

4 

30 

5-4 

4 . 8 

5-1 

4.0 

4 

1 . 54 

Note . A t-test  for  independent  samples  was  conducted  when 

the  difference  between  the  mean  of  the  two  groups  was 
.5  or  greater. 


aRounded  off  to  nearest  whole-number  score  for  easy 
comparison . 
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Table  F-14 

Range  and  Mean  Scores  of  Two  Random  Samples 
after  Training  by  Developer  and 
by  Second-Level  Trainer: 

Survey  on  Trainer  Effectiveness 
(Positively  Phrased  Questions) 


Question 

Number 

Group  B 
( Developer 
Trained ) 
Range  Mean 

Group  C 
( Second-Level 
Trained) : 
Range  Mean 

Grand 

Mean3 

t-test 

(A  score 

of  1 represents 

strong  agreement 

with  a positively 

phrased 

question) 

1 

1-3 

1 . 3 

1 

1 

1 

2 

1-2 

1 . 1 

1 

1 

1 

3 

1-2 

1 . 1 

1-2 

1 . 2 

1 

4 

1-2 

1 . 2 

1-3 

1 . 4 

1 

6 

1-2 

1.3 

1-4 

1.7 

2 

9 

1-2 

1 . 3 

1-2 

1.2 

1 

10 

1-2 

1.3 

1-4 

1 . 4 

1 

12 

1-2 

1 . 4 

1-3 

1 . 6 

2 

13 

1-2 

1.4 

1-4 

1 . 8 

2 

14 

1-3 

1 . 5 

1-5 

2 . 7 

2 

0.59 

15 

1-2 

1 . 1 

1-2 

1 . 1 

1 

18 

1-2 

1 . 1 

1-2 

1 . 1 

1 

21 

1-2 

1 . 1 

1 

1 

1 

22 

1-3 

1 . 6 

1-5 

2 . 3 

2 

1 . 32 

23 

1-2 

1 . 1 

1 

1 

1 

24 

1-2 

1 . 2 

1-4 

1 . 4 

1 

25 

1-2 

1.3 

1-3 

1 . 4 

1 

26 

1-2 

1 . 2 

1-2 

1 . 2 

1 

27 

1-4 

1 . 9 

1-2 

1 . 5 

2 

Note.  A t-test  for  independent  samples  was  conducted  when 

the  difference  between  the  mean  of  the  two  groups  was 
.5  or  greater. 


aRounded  off  to  nearest  whole-number  score  for  easy 
comparison . 
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Table  F-15 

Range  and  Mean  Scores  of  Two  Random  Samples 
after  Training  by  Developer  and 
by  Second-Level  Trainer: 

Survey  on  Trainer  Effectiveness 
(Negatively  Phrased  Questions) 


Question 

Number 

Group  B 
( Developer 
Trained ) 
Range  Mean 

Group  C 
( Second-Level 
Trained) : 
Range  Mean 

Grand 

Mean3 

t-test 

(A  score 

of  1 represents 

strong  disagreement 

with  a negatively 

phrased 

question) 

5 

5-4 

4.9 

5-1 

4 . 1 

4 

1 . 48 

7 

5-4 

4.9 

5-4 

4 . 8 

5 

8 

5-4 

4 . 9 

5-4 

4 . 8 

5 

11 

5-3 

4 . 7 

5-1 

4.2 

4 

0 . 86 

16 

5-1 

3 . 5 

5-1 

4.0 

4 

0.83 

17 

5-1 

3 . 6 

5-1 

2.9 

3 

1 . 03 

19 

5-4 

4 . 6 

5-1 

4.5 

5 

20 

5-1 

4 . 4 

5-1 

4 . 0 

4 

28 

5-4 

4 . 8 

5-1 

4.9 

5 

29 

5-4 

4 . 7 

5-1 

4 . 8 

5 

Note . A t-test  for  independent  samples  was  conducted  when 

the  difference  between  the  mean  of  the  two  groups  was 
.5  or  greater. 


aRounded  off  to  nearest  whole-number  score  for  easy 
comparison. 
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